Application of Probabilistic methods to predict individual match results in the Indian Premier league (IPL):
Vijayakumar Ramamoorthy1, Saravana kumar Selvaraj2
1,2
Faculty of Engineering and Applied Science, Memorial University of Newfoundland, St.John’s,
NL, Canada, A1B 3X5
Abstract
The Indian Premier League is one of the professional Twenty20 cricket league which is conducted in India annually by franchise teams representing Indian cities. The IPL is the mostwatched Twenty20 league worldwide. In 2010, the IPL became the first sporting event to be broadcast live on YouTube. The brand value of the 2014 Indian Premier League was estimated to be US$7.2 billion. Cricket is one of the game which includes lots of betting and predictions. Due to this reason many researchers try out different methods to introduce new models to predict the result of the match. In this report we are making an attempt to predict the outcomes of some of the most important matches that will be played in the Indian Premier league in the forth coming year from April 8 to May 29. We use the Poisson distribution to create a model with which we will be able to predict (1) if there is any Home ground advantage to the home team, 2) Probability of the total runs per over that will be scored in the match by individual teams, finally we will predict which team will win the match using this model. We use the data from the previous seasons 2014 and 2015 to predict these matches. This model introduces you to predicting individual matches which can be further developed into an advanced model with which you can predict the season results and individual scores of a single batsmen.
Introduction to Cricket and Betting:
Cricket is the second most popular game in the world only next to football. The latest form of cricket is the Twenty20 or T20 cricket. It involves two teams with each team batting maximum of 120 balls i.e. twenty overs and is completed in about two and half hours, a much shorter duration when compared to older forms. According to the inaugural annual review of Global Sports Salaries published by sportingintelligence.com in April 2010, the IPL is the second highest-paid league in the world after the National Basketball Association (NBA), based on first-team salaries on a prorata basis. “It has become second highest paid sports league in the world within three years of its inauguration”. The total economic output associated with IPL matches in India for 2015 is estimated at INR 26.5 billion (USD 418 million). This is the aggregate value of all transactions that took place as a direct, indirect or induced effect of the economic activity of the 2015 matches.
Hosting an IPL match also adds value and revenue to the economy of the state.
Predicting the results of cricket matches is of interest to a wide variety of parties – the players, their coaches and managers, fans, sponsors and the gambling community, both gamblers and bookmakers/betting exchanges. According to Sportradar director Darren Small, the international sports match-betting industry is worth an estimated $ 700 billion to $1 trillion annually. An estimated 20% of the overall annual trade comes from cricket betting. With increase in betting of cricket every year, we decided to understand how each and every bookies predict the matches in order to decide the odds of the match. On further analysis and study we found out that the betting websites and the bookies use various probabilistic models to predict certain aspects of
every match as all the aspects could not be fit in a single model. This makes it easier for them to set the odds for the matches accordingly. Thus it depends on how good the probabilistic model in predicting the results for the bookies and websites to make profit.
Literature Review:
The betting market and interest in application of probability in sports have let the researchers to introduce a variety of models to predict results in cricket matches which are verbalized by varied projection methodologies. While most of these models focus on predicting the tournament outcomes or league positions of each teams, our curiosity is in forecasting outcomes of individual matches.
(Tim B. SWARTZ, Paramjit S. GILL and Saman Muthukumarana, 2009) considered the team quality, Their performance in recent matches and the significance of the match, whereas
(Kuypers, 2000) considered that tea performance as well as published bookmakers’ odds, in one of the studies by (Jack Davis, Harsha Perera And Tim B. Swartz from Simon Fraser University ) they used probabilities of batting outcomes are obtained for the first innings and use the target score to modify batting probabilities in the second innings, (Dilaksha Attanayake and Gordon
Hunter) use data on real players and performance measures based on both batting averages and strike rates, (Aaron Corris, Anthony Bedford and Ian Grundy) used factors like Home Ground
Advantage, winning the toss and which team batted first.
One of the uncommon approach to cricket prediction is the Poisson distribution where the match results are generated by the batting and bowling strengths of the two competing teams.
Indian Premier League a summary:
To clearly understand about the IPL We will need to look into a brief description of the league and its rules. A total of eight teams compete yearly in the Indian Premier League. Each of the 8 teams plays against each other team in the league twice once in their home ground and once in the away (i.e) the oppositions home ground. For each win a team is awarded 2 points, 1 point to each team if either the match is drawn or abandoned due to unfortunate situations and no points if they lose a match. At the end the top four teams scoring highest points move to qualifiers. Top two teams will play the first qualifier. The winner of the qualifier one will directly advance to the finals.
The loser will face the winner of the second qualifier match and the second finalist will be decided.
We will be predicting the matches before the qualifiers between each teams and we will explain the importance of each and every match in this league for the teams to advance to qualifiers. There is no actual thing such as rivalry between the teams which play in this league. The teams which enter the qualifiers also becomes eligible to play the champions league which is played between the each country’s top league teams. In this project we have tried to answer three questions 1) How to calculate the number of runs scored per ball by the home and away team? 2) Which team is most likely to top the table? 3) Did home team have home advantage as a major role in their victory?
Table 1 shows the list of all matches that will be played in the forth coming year before the qualifiers. Table1 List of matches to be played in 2016 in random order
Serial
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Home Team
Chennai Super Kings
Chennai Super Kings
Chennai Super Kings
Chennai Super Kings
Chennai Super Kings
Chennai Super Kings
Chennai Super Kings
Mumbai Indians
Mumbai Indians
Mumbai Indians
Mumbai Indians
Mumbai Indians
Mumbai Indians
Mumbai Indians
Sun Risers Hyderabad
Sun Risers Hyderabad
Sun Risers Hyderabad
Sun Risers Hyderabad
Sun Risers Hyderabad
Sun Risers Hyderabad
Sun Risers Hyderabad
Kings XI Punjab
Kings XI Punjab
Kings XI Punjab
Kings XI Punjab
Kings XI Punjab
Kings XI Punjab
Kings XI Punjab
Rajasthan Royals
Rajasthan Royals
Rajasthan Royals
Rajasthan Royals
Rajasthan Royals
Rajasthan Royals
Rajasthan Royals
Royal Challengers Bangalore
Royal Challengers Bangalore
Royal Challengers Bangalore
Away Team
Mumbai Indians
Sun Risers Hyderabad
Kings XI Punjab
Rajasthan Royals
Royal Challengers Bangalore
Delhi dare devils
Kolkata Knight Riders
Sun Risers Hyderabad
Kings XI Punjab
Rajasthan Royals
Royal Challengers Bangalore
Delhi dare devils
Kolkata Knight Riders
Chennai Super Kings
Kings XI Punjab
Rajasthan Royals
Royal Challengers Bangalore
Delhi dare devils
Kolkata Knight Riders
Chennai Super Kings
Mumbai Indians
Rajasthan Royals
Royal Challengers Bangalore
Delhi dare devils
Kolkata Knight Riders
Chennai Super Kings
Mumbai Indians
Sun Risers Hyderabad
Royal Challengers Bangalore
Delhi dare devils
Kolkata Knight Riders
Chennai Super Kings
Mumbai Indians
Sun Risers Hyderabad
Kings XI Punjab
Delhi dare devils
Kolkata Knight Riders
Chennai Super Kings
Mumbai Indians
Sun Risers Hyderabad
Kings XI Punjab
Rajasthan Royals
Kolkata Knight Riders
Chennai Super Kings
Mumbai Indians
Sun Risers Hyderabad
Kings XI Punjab
Rajasthan Royals
Royal Challengers Bangalore
Chennai Super Kings
Mumbai Indians
Sun Risers Hyderabad
Kings XI Punjab
Rajasthan Royals
Royal Challengers Bangalore
Delhi dare devils
Data Acquired for calculation:
The data that is being used for the study were the results of all Indian Premier League (IPL) matches from 2014 and 2015 season which is inclusive of all the 56 league matches played before qualifiers. All this information and data is obtained from the Indian Premier League official website. Since we needed our model to be accurate we planned to take the data from the most recent seasons in 2014 and 2015. This is because every three years players are being re-auctioned and the teams are allowed to retain only four players. This movement of players between the teams will seriously affect the team’s efficient functioning. There are instances where the team faces more issues like change in manager and coach. If a new coach is hired for the team he implies new tactics which changes the entire way how the team functions. Hence to avoid such complications we have decided to use the most recent data.
While preparing our model we followed the model based on Poisson distribution to predict the win or loss for a particular match. This model consisted of both the team’s batting strength and it also uses the home team advantage concept. But that model failed to include the bowling strength of the opposition team which has to be considered in order to predict the first teams score. We tried to consider the below factors while constructing our models.
Home game advantage
Batting and bowling strength of home team
Batting and bowling strength of away team
While we are considering the above factors for our model we have ignored a lot of other factors that might affect our model and give an even approximate result. The factors that was ignored are:
Rain: India is one place where we cannot predict when we get the downpour which may seriously impact the result of a match.
Injuries: Injuries to key players in each teams that happen before and during a match is not taken into consideration.
Duckworth Lewis: This system changes the course of an entire match since it adjusts the scores accordingly if we have a reduced game of play.
Fatigue: Players get extremely tired due to the humid conditions back in India which might affect the run scoring and bowling capability of players.
Net run rate: Net run rate plays an important role when there is a tie in points between two teams Table 2: Team Statistics Home
Team
Chennai Super Kings (CSK)
Kings XI Punjab (KXIP)
Kolkata Knight Riders
(KKR)
Mumbai Indians (MI)
Rajasthan Royals (RR)
Sun Risers Hyderabad
(SRH)
Royal Challengers
Bangalore (RCB)
Delhi Daredevils (DD)
Sum
Average
Sun Risers Hyderabad
(SRH)
Royal Challengers
Bangalore (RCB)
Delhi Daredevils (DD) sum Average
7
4
8
1119
159.8571429
1180
168.5714286
7
5
10
938
134
987
141
7
2
5
970
161.6666667
991
165.1666667
56
25
53
7 3.125
6.625
8121
1229.12381
1015.125 153.6404762
8542 1289.080952
1067.75
161.135119
Table 2 shows the point table and the runs scored by each team in their home matches and how much runs they have conceded. Table 3 shows point table and the runs scored by each team in their away matches and how much runs they have conceded. Each team has played a total of 7 games at home and 7 games away from home .The basic concept here is to win more home games and gain points with it. This can also be true with the average run scored at home being RS= 161.67 when compared to the average runs scored in away matches RS=153.64. This is exactly opposite in the case of runs conceded by a particular team home and away.
Building our Probabilistic Model:
We are going to take Historical data from the official site to calculate the number of runs scored and conceded by a team. With the average of these we create the batting and bowling strength of each team which are then converted into expectation figures. This metric data is then fed to the Poisson distribution formula which works out the probability of every result when two teams face against each other. Then we take these probability and find the highest probability of score, the team that has the highest chances to win and if home advantage helped the home team.
The magnificence with a technique like this is that there are a number of diverse points during the method where you have options to try a different value as an input and obtain different results.
Poisson distribution is a mathematical theory used for converting mean averages into a probability for various results. For Example Indian cricket team might have an average of 8.6 runs per over. With this information being fed to the Poisson distribution formula we get the results of
Indian team scoring 50 runs or less 10.6% of the time, 100 runs and less in 34% of time, 150 runs and less in
If we have to use the Poisson distribution for calculating probability of a team winning the match we need to calculate the average runs scored by both teams in that match . This can be calculated by knowing their batting and bowling strengths and comparing it. Selecting a proper data range is very important when calculating the batting and bowling strength of each team. To calculate the attack and defense strength we cannot use outdated data after which the team would have undergone lot of changes. So for the analysis we are taking the data from 2015 season of all the 14 matches played by each team before the qualifiers.
Determining the batting and bowling strength:
1. Determine the average runs scored by a team home and away:
Determining the average number of runs scored by a team per home game and per away game is the initial step in calculating the batting and bowling strengths of a team based on the results of the past season.
This is calculated by dividing the total number of runs scored by a team in the season by the number of games they played in home and away.
Total runs scored at home in a season/Number of games at home
Total runs scored at away in a season/Number of games at away
During the season of 2015 there was a total of 8540 runs were scored in home games and
8121 runs were scored in away games. Not to mention the fact that there were two matches in the home game which was abandoned due to rain.
When we calculate the average runs scored in a home match it comes to 161.75 when compared to the away matches where the average runs scored equated to 153.6.
The average number of runs scored at home: 161.675
The average number of runs scored away from home: 153.6
The difference from the above averages constitutes to a team’s batting strength.
We will also need to calculate the average number of runs conceded by a team in home matches and away matches to know thee bowling strength of a team.
Average number of runs conceded at home matches: 153.35
Average number of runs conceded away: 161.13
The calculated values of the batting and bowling strengths can be used to calculate the batting and bowling strength of each team in every match they are going to play in the league. Let us take a match for the sample calculation between Chennai super kings and royal challengers
Bangalore.
2. Predicting Chennai’s NRR at Home:
Chennai’s batting strength can be calculated by taking the average of Chennai’s total runs scored at home divided by the total home matches they have played.
= Total runs scored at home/total games played
= 1146/7 = 163.71
Divide this value by the season’s average per game (163.71/161.675) to get the batting strength of the team = 1.0125. This shows that CSK as scored 1.2% more at home than the average calculated by us for the last season.
To calculate the bowling strength of royal challengers Bangalore we have to calculate the average of total number of runs conceded away to the total number of matches played
= Total runs conceded at away/total games played
=987/7=141
Divide this value by the season’s average per game (141/161.13) to get the bowling strength of the team =0.8750.This highlights that RCB concedes 12.5% less runs than the average runs conceded in the entire league in away games.
Now we can use the below formula to calculate the number of runs that the home team might be able to score per over.
CSK’s score= CSK’s batting strength x RCB’s bowling strength x average number of runs scored by CSK at home / 20
= 1.0125*0.8750*163.71/20 = 7.2542
According to the above Chennai will score 7.2542 runs per over.
3. Predicting NRR of RCB Away:
RCB’s batting strength can be calculated by taking the average of RCB’s total runs scored away from home divided by the total away matches they have played.
= Total runs scored away from home /total games played
= 938/7 = 134
Divide this value by the season’s average per game (134/153.64) to get the batting strength of the team = 0.8736. This shows that RCB has scored 12.63% less than the average calculated by us for the last season.
To calculate the bowling strength of Chennai super kings we have to calculate the average of total number of runs conceded in home to the total number of matches played
= Total runs conceded at home/total games played
=968/7 = 138.28
Divide this value by the season’s average per game (138.28/153.38) to get the bowling strength of the team =0.9015 .This highlights that CSK concedes 10% less runs than the average runs conceded in the entire league in home games.
Now we can use the below formula to calculate the number of runs that the home team might be able to score per over
RCB’s score= RCB’s batting strength x CSK’s bowling strength x average number of runs scored by RCB away from home /20
= 0.8736*0.9015*134/20= 5.2765
According to the above RCB will score 5.27 runs per over which means CSK is surely to win the match.
4. Application of Poisson to forecast multiple outcomes:
We all surely know that no game can end this way with 145.084 and 105.53 runs on the board. This is simply the average. Poisson distribution allows to use these figures to distribute
100% of probability across a range of run outcomes from each side that will be possible. The formula that is being used for our model is
ʎ −ʎ −
(, ) =
∗
∗ 100
!
!
Where ʎ and are the predicted runs for the CSK (Home team) and RCB (Away team). In
This case ʎ=7.2542 and =5.2765. Suppose we are supporting for the home team and our bet is that CSK wins the match against RCB and let us consider that Chennai scores 7 runs per over and
RCB scores 5 runs per over. We can find the probability of this outcome using our model. In this case i=7 and j=5 in hundreds
(7,5) =
7.25427 −7.2542 5.27655 −5.2765
∗
∗ 100
7!
5!
= 0.1483 ∗ 0.1741 ∗ 100 = 2.58%
The probability of CSK scoring 7 runs per over and RCB scoring 5 runs per over is 2.58%.
This percentage is very low for per over score which is very reasonable. Hence we have to calculate all scores that are possible in an over and then we can sum up the probabilities so that we can find the winner of the match. To do so we use our model’s Poisson distribution formula in excel to calculate the probability of all possible scores per over. The best way we have found is too build a matrix with all possible scores per over. We have found that the last seasons maximum score was to be 240 so we can set a limit of 12 runs per over. The formula that is to be used in excel to calculate the probability goes as below.
=POWER(7.2542,7)*EXP(7.2542)*POWER(5.2765,5)*EXP(5.2765)/FACT(5)/FACT(7)
=2.58%
The below table shows all the possible outcomes that we can get in a single over of a match. It shows the outcomes for both the teams. From the below table we can ensure that Home team is definitely at an advantage of winning the match.
Home team
Table 4: Outcomes per over and probability of Winning
We found from the above table that the Probability of Home team winning the match is nearly equal to 63% and the probability of Home team losing the match is nearly 24% and the probability of the match ending in a draw is nearly 10%. The total probability comes to around
97%. This is because we have restricted the maximum runs that can be scored to 240 from the last season’s maximum score per innings.
5. Results and Discussion
As we calculated the Match result between CSK and RCB using the Batting and bowling strength, we can calculate the outcomes of other match as well. For that we have to calculate the batting and bowling strength of each and every team. This can be done easily using Excel workbook. Once we calculate the batting and bowling strength of each team we can then use it to predict the outcome of each and every match they play in the same way we calculated the above prediction. The Below Table shows the batting and bowling strength of each team at their home ground and away ground. With this we can calculate the predicted amount of runs per over each team will score against each other in their home and away matches. Once we get the predicted amount of runs per over we can calculate the probability of each team winning the match and what is the probability that the match will be a draw.
From table 6 we have obtained each team’s NRR against other teams at home and away matches. With this we can calculate the probability of which team will win the match. Below are calculations for few matches selected from the above list in random. Our model shows each and every outcome that is possible for a match and if the home ground favours the home team or not.
Table 8: Prediction of Winning team and If home Ground was of any advantage
Home Team
Away Team
Winning Team
Winning Percentage
Home Advantage
CSK
RCB
CSK
62.722 Yes
MI
DD
DD
41.804 No
SRH
RR
RR
41.172 No
KXIP
CSK
CSK
60.53 No
RR
KKR
RR
47.17 Yes
RCB
KKR
RCB
54.988 Yes
DD
SRH
SRH
49.23 No
KKR
MI
KKR
32.67 Yes
RCB
KXIP
RCB
52.1 Yes
Table 8 Shows the Winner of each and every match which we selected randomly from the
56 matches that is to be played in the next season. From the above table we can observe that not all the matches that are played in home favours the home team. It depends on the team’s formation, batting and bowling strength. One more reason is that all the teams have equally strong players which makes it even tough for the home team to win matches at home ground. From this we can conclude that home advantage does not play a very important role in deciding the result of a match.
From the Above calculation We come to know that CSK will win both the matches, RCB will win 2 matches and lose 1, MI will win 1 match and lose 1 match, SRH will win one match and lose 1 match, KXIP Will lose both the matches they play, RR will win both the matches it has played, DD will win 1 match and lose 1 match and KKR will win 1 match and lose 2 matches.
These matches were selected randomly from the 56 matches that are to be played in the season for the purpose of demonstrating the model.
Limitations:
Poisson distribution is the simplest method to calculate the probability of a match. This does not allow most factors to fit in. Few Factors that were ignored in this model are
Rain- We have assumed that all match happens to full 20 overs but in India we don’t know when a game will be interrupted by rain. This can even cause the teams to abandon the match.
The Model uses Statistical data to predict future results. The Accuracy of this model is in question here. Do all the players play the same way as they played in the past year and does the climate remains same?
Pitch Conditions were completely ignored while building the model. Each pitch varies to great extent and in few pitches a batsman cannot score even a single run.
The only factor that has been considered in this model is the Run Rate. We have seen a lot of matches where the team which is deficit in the run rate scores more than the expected Run Rate in the last 5 overs. The NRR gives us the Final run rate but it does not tell what actually happened during the match
Duckworth – Lewis Concept has been completely ignored in our model because it complicates the entire calculation. This concept is available for adjusting the score to be scored according to the number of overs available for play in case of any unforeseen events. Most players playing in the IPL are international players and they have international matches to play. We don’t know which player will be available for the team which adds up to the strength and weakness of them.
Other Prediction Models Available:
SWARTZ – MUTHUKUMARANA’s Model:
In this Model the Authors have used a method where they use the outcome of each and every ball which is finite to build a discrete generator where the outcomes are estimated with historical data involving one-day international cricket matches. “The probabilities depend on the batsman, the bowler, the number of wickets lost, the number of balls bowled and the innings.” Davis – Perera Model:
In this Model the Authors determine the batting probabilities “based on an amalgam of standard classical estimation techniques and a hierarchical empirical Bayes approach where the probabilities of batting outcomes borrow information from related scenarios”. First they obtain the score of the first innings of the match and then they use the target score to modify the batting probability of second innings.
Conclusion:
The Main objective of this project was to build a probabilistic model which can be used to predict the result of a match in Cricket. Our basic model is a simple which uses Poisson distribution to defining the batting and bowling strength and obtaining the Net Run Rate each team will score in a single over of the match. This data that is used for this calculation is statistical which was taken from the previous year’s league matches. The Computations are manageable and easy and the model is predicting results in a decent manner without deviating much. We calculate all the possible runs that can be scored in an over. We limited the maximum run that can be scored in an over 12 since the maximum score per innings was 240. With this model we have also predicted if home advantage is there for the home team and whether it acts as a game changer or not.
Our model is simple and the results obtained from it are satisfactory but it has some limitations and few crucial factors were not considered while building the model. The factors that were left out are Duckworth- Lewis, rain, injuries, and pitch condition etc. All these factors have to be considered and the model has to be updated. Although it has limitations the model is just a
starting point where we understand the basics of predictions and matching it with odds. This model can be used at the initial stages of betting to understand the probability of a team winning the match and then we can move to using advanced models.
References:
[1] Predicting the Winner in One Day International Cricket by Ananda Bandulasiri, Ph.D.
[2] Predictive Modeling in Sports Leagues: An Application in Indian Premier League by Pankush
Kalgotra, Ramesh Sharda, Goutam Chakraborty, Oklahoma State University, OK, US
[3] A SIMULATOR FOR TWENTY20 CRICKET by JACK DAVIS, HARSHA PERERA AND
TIM B. SWARTZ Simon Fraser University
[4] The Sports Process: A Comparative and Fundamental Approach. Champaign: Human
Kinematics, P. 129 by Dunning, E.G. & Joseph A. Maguire, R.E. (1993)
[5] http://www.iplt20.com/stats
[6] http://www.iplt20.com/archive
[7] Sport Matters: Sociological Studies of sport, Violence and Civilization London: Routledge by
Dunning,E. (1999)
[8] Modelling and simulation for one-day cricket by Tim B. SWARTZ, Paramjit S. GILL, Saman
MUTHUKUMARANA The Canadian Journal of Statistics Vol. 37, No. 2, 2009, Pages 143–160
[9] Auto-play: A Data Mining Approach to ODI Cricket Simulation and Prediction by Vignesh
Veppur Sankaranarayanan, Junaed Sattar and Laks V. S. Lakshmanan, University of British
Columbia
[10] Rating teams and analysing outcomes in one-day and test cricket. Journal of the Royal
Statistical Society. Series A (Statistics in Society), 167(4):pp. 657–667, 2004 by P. E. Allsopp and
S. R. Clarke.
[11] On the forecast accuracy of sports prediction markets. In Negotiation, Auctions, and Market
Engineering, International Seminar, Dagstuhl Castle, volume 2, pages 227–234, 2008, by S.
Luckner, J. Schr¨oder, and C. Slamka.
[12] A statistics analysis of batting in cricket, Journal of the Royal Statistical Society Series A,
156, 443-455 by Kimber, A.C. and Hansford A.R. (1993)
[13] Cricket scores and some skew correlation distributions. Journal of the Royal Statistical
Society, Series A, 108, 1–11.
[14] Probabilistic Modelling of Twenty-Twenty (T20) Cricket: An Investigation into Various
Metrics of Player Performance and their Effects on the Resulting Match and Player Scores,
International Conference on Mathematics in Sport Loughborough University, U.K. 29 June – 1
July 2015, pages 1-10 by D. Attanayake and G. Hunter.