Guest contributor Robert Sawyer does some intense statistical analysis on whether or not the spread is actually dead.
The theory proposed by Christian Pina regarding the “Spread is Dead” generally follows two primary statistics:
1) When a favorite wins, they cover the spread ~85% of the time
2) When an underdog covers, they win straight up ~82% of the time
There are two corollaries of advice that would follow from trying to apply this theory to future bets:
1) If you plan to bet a favorite, bet against the spread. Theoretically speaking, since when a favorite wins they cover the spread the majority of the time (~85%), the increased payout for betting against the spread would benefit the bettor more than the reduced payout from betting the favorite straight up. Essentially, since the favorite is going to cover anyway when they win, you may as well bet against the spread for the relatively higher payout.
2) If you plan to bet an underdog, bet the moneyline. Theoretically speaking, since when an underdog covers, they win straight up the majority of the time (~82%), the increased payout for betting the moneyline would benefit the bettor more than the reduced payout from betting the spread. Essentially, the underdog generally does not “need” the spread to cover, so the increased payout from the moneyline is worth taking.
This report will provide three contributions in discussion of this theory. First, the two primary statistics will be backtested over the past nine NFL seasons. Second, a theoretical discussion of why these statistics are not surprising will be conducted, including a demonstration of how the mathematical properties of conditional probability explain this observed phenomenon. And third, a profit comparison of betting using the pieces of advice will be conducted, using spreads and moneylines from the past nine NFL seasons.
Section 1. Extended Backtest
Using Sportsbook Review, spreads and moneylines for all NFL regular season games and most playoff games from the past nine seasons were scraped. Lines from Bovada were primarily scraped, but in cases where a Bovada spread or moneyline was not available, 5Dimes or Bookmaker lines and spreads were used instead. This includes about 266 games per season, as some games were not available across the books, and a total of 2381 games over these previous nine seasons.
Table 1 gives some of the basic summary statistics of this data. Note that total number of games in the cover rows are reduced since pushes are removed to calculate percents that better reflect a gamblers true winning percentage (since pushes are not really losses) and to allow calculations of the inverses of these conditions. (e.g. The Home Cover percent is 0.498 after pushes are removed so 1 – 0.498 = 0.502 is the Away Cover percent, the inverse of Home Cover’s percent).
Table 1. Summary statistics of spread data obtained from past nine NFL seasons from Sportsbook Review
|Number of Games||Wins/Covers||Percent|
|Home Favorite Cover||1479||718||48.5%|
|Home Underdog Cover||814||423||52.0%|
|Home Margin of Victory||2.56||14.9|
|Favorite Margin of Vic||5.64||14.0|
This table reinforces some of the age-old wisdoms of sports betting, such as avoiding favorites (favorites covering less often than an expected 50%), and finding value on road teams (home teams covering less often than an expected 50%). Many quantities of interest can be derived from those reported in the table. For example, underdog covers are the inverse of favorite covers, so the number of underdog covers in this dataset is 2293 – 1109 = 1184 (51.6%).
However, these results are not novel, and not necessarily related to the theory at hand, they just serve as useful baselines to frame the results observed later on, and demonstrate that the spreads obtained from Sportsbook Review are reasonable. The following lists presents the results of the theory being tested at hand. The “Favorite Win” row from Table 1 will serve as an important number in the following theory, since this will serve as the games we need to look at for testing the first part of the theory, and its inverse “Favorite Loss” will serve as the games we need to look at for testing the second part of the theory.
Below are the results of backtesting the theory over the current dataset of the past nine NFL seasons. It is important to note that pushes are included in these results, this is because when a spread pushes the straight up result of the game matters (if the spread was anything other than zero).
Part 1 of theory
- Favorites won straight up 1597 times (same number from Table 1, 67.1% of the games)
- When favorites won, they covered 1109 times
- When favorites won, they covered or pushed 1197 times (88 pushes in dataset)
- This translates to when favorites won, they covered 69.4% of the time,
- When favorites won, they covered or pushed 75.0% of the time
Part 2 of Theory
- Underdogs covered or pushed 1272 times
- When underdogs covered or pushed, they won 778 times
- When underdogs covered (1184 times), they won 778 times
- This translates to when underdogs covered or pushed, they won 61.6% of the time
- When underdogs covered, they won 65.7% of the time
So while the results observed in the Sportsbook Review data over the past nine seasons is not as extreme as those observed in the proposed theory, they still are higher than the average person would expect. However, in the next section this phenomenon is explained through the lens of conditional probability.
Section 2.1. Mathematical Theory: Favorites
Let us consider the favorite team’s margin of victory as a random variable, X. This will be negative when the favorite loses, and positive when the favorite wins, e.g. if a favorite wins by 3, then X = 3. As seen in Table 1, the expectation (estimated through the average) of this random variable is E[X] = 5.64. The probability of the favorite team covering some spread S. In this case, we will treat spreads as the negative of traditional spreads, so that it better aligns with the margin of victory notation used here. For example, if a game is Patriots -7 against the Bills, we will say the favorite’s spread is 7, so the Patriots margin of victory, X, needs to be larger than S = 7 for them to cover. This is notationally given by the expression P(X > S), and if we go by the estimates from this data, can be estimated by the quantities in Table 1 as P(X > S) = 0.484, i.e. empirically it was found that the probability that the favorite covers the spread is 48.4%.
For this discussion, consider an approximation for the distribution of X to be a normal distribution with the estimated parameters from Table 1 (mean=5.64, std=14.0). Figure 1 provides a visual representation of this data, with the left image of Figure 1 plotting the empirical data (light blue) and a theoretical distribution as an estimate of this data (orange curve). The image on the right is a QQ-plot, which helps confirm that a normal distribution generally fits the data, i.e. the margin of victory for the favorite team can roughly be estimated as a normal distribution with mean 5.64 and standard deviation 14.0. The fit here is not entirely important, but allows us to estimate theoretical probabilities of us observing these seemingly irregular results with the approximation. Section 3 will use the empirical data again to estimate profit margins from using the proposed “Spread is Dead” theory.
Figure 1. (Left) The empirical distribution of the margin of victory for teams designated as the favorite by their spread. (Right) A quantile-quantile plot demonstrating a reasonable fit of the normal distribution to the favorite margin of victory variable.
In the preceding paragraphs, we used quantities from Table 1 to estimate P(X > S) = 0.484. However, the proposed theory is primarily interested in the conditional probability, P(X > S | X > 0). In other words, the probability that the favorite covered the spread given that the favorite won the game. The “given” here indicates that we are interested in a conditional probability, where the condition is the assumption to the right of the vertical bar. Using the definition of conditional probability, this gives us the following equations in this context:
The probability statement in the numerator is a joint probability, which is the same as asking the probability that both events occur, or in this case, that the favorite covers and the favorite wins. In other words, this conditional probability is the same as asking: What is the probability the favorite covers the spread and wins divided by the probability that the favorite wins? However, given the definition of a favorite, we know that the joint probability in the numerator is actually the same as the probability of the favorite covering, or P(X > S, X > 0) = P(X > S), since if the favorite has covered, then they also have won the game. This is the same as saying: for both X > S and X > 0 to be true, X > S has to be true, since the spread cannot be less than 0 (in the ‘margin’ perspective we have been using for simplification purposes here, relative to a favorite, or they would not be the favorite).
This will make more sense with a visual representation. As we have seen before, the distribution of X can reasonably be approximated by a normal distribution centered about 5.64, the orange curve in Figure 1 (left). This conditional probability removes the portion of the distribution below X = 0, effectively truncating the probability density at the negative values (since we have given the condition that X is larger than 0, i.e. the favorite won the game) and preserves the positive values. This can be seen in the visualizations of Figure 2 by observing what happens to the blue portion (density representing the probability that the favorite does not cover) and orange portion (density representing the probability that the favorite does cover). After introducing the condition that X > 0 (i.e. the favorite won the game) the blue portion is greatly reduced while the orange density is preserved. Now the equations above should be easier to understand, as P(X > S) is the orange region, P(X > 0) is the non-green region in the right hand image, meaning our new P(X > S | X > 0) is the orange density divided by the new (reduced) blue region of Figure 2 (right).
Figure 2. Illustration of moving from the marginal probability of the favorite covering (Left) to the conditional probability of the favorite covering given the favorite won (Right), where the green region represents outcomes ruled out by the condition.
Intuitively, it may make sense to think about it like this: when we know that the favorite has won, we eliminate all of the probability that the favorite’s margin of victory can be less than 0. Then the possibilities that the favorite’s margin of victory is less than the spread have decreased, while the possibilities that the favorite’s margin of victory is more than the spread have remained the same. Since the total probability of outcomes of any mutually exclusive event (Favorite cover, Push, Favorite does not cover) must sum to one, we now have much more probability that the favorite covers, relative to the reduced amount of probability that the favorite does not cover, since we know the favorite has not lost. The problem here is that it is not realistic for us to know if the favorite has won or not until the game occurs, which is not useful from a betting point of view.
More concretely, using the above example and the normal approximation, we can calculate what the theoretical expectation of P(X > S | X > 0) following the conditional probability described above. More specifically, the green region P(X < 0 | mean=5.64, sd=14.0) = 0.34, meaning P(X > 0) = 0.66. Thus, the formula follows:
Which means that under the theoretical normal assumptions, we would expect that when the favorite wins, the favorite will cover the spread 75% of the time. This number lies between the empirical estimate here (69.1%) and the estimate mentioned in the original theory (~85%). When pushes are not included, the empirical estimate here is equal to this theoretical estimate of 75%. This expected cover percent increases as the spread becomes closer to zero, since the proportion of the blue region that gets reduced by the conditional grows as the spread becomes closer to zero. This in turn lowers the denominator in the conditional probability (while the numerator remains constant), increasing the expected percent of the time the favorite covers given that they won the game.
For example, there were 808 games when the spread was between 0 and 3 (inclusive). In these games, the favorite won the game 457 times. Of these 457 games, the favorite covered the spread 369 times, meaning that when the favorite won, the favorite covered the spread 80.7% of the time. Compare this to large spreads, where there were 862 games where the favorite was favored by at least 7 points. The favorite won the game 684 times in these games. Of these 684 games, the favorite also covered the spread 405 times, meaning when the favorite won, the favorite covered the spread 59.2% of the time. This discrepancy is indicative of the fact that by using this condition (that the favorite won), we drastically lower the probability density for not covering when the spread is low relative to when the spread is high. In terms of Figure 2, when the spread is low, the blue region becomes much smaller relative to when the spread is high, as a proportion of the (constant) orange region.
Section 2.2. Mathematical Theory: Underdogs
Let us consider a random variable, Y, that represents the margin of victory for an underdog, where an underdog receives T points from the spread. Then the probability the underdog covers is P(Y + T > 0) and the probability the underdog wins straight up is P(Y > 0). So the conditional probability of interest is now P( Y > 0 | Y + T > 0), which is saying that we are interested in the probability the underdog wins given that the underdog covered the spread. Following the definition of conditional probability, we can derive the following equation.
Similarly to Section 2.1, the join probability in the numerator is the probability that the underdog wins and that the underdog covers. However we can simplify this because we know that if the underdog has won, they have covered by definition of being an underdog, meaning this numerator is equal to P(Y > 0). The denominator is the probability that the underdog covers, which in theory should be 50% (though empirically appears to be slightly higher). This reduces the conditional probability expression to the following equation:
Since we have already approximated the distribution of the favorite’s margin of victory (X) as following a normal distribution, we can similarly estimate the underdog’s margin of victory (Y) as following a normal distribution, under the property that X = -Y (e.g. a favorite winning by 5 is the same as an underdog losing by 5). In other words, we know that Y ~ Normal( mean= – 5.64, sd=14.0), which lets us estimate P(Y > 0) theoretically using the cumulative density function of the Normal distribution, which comes out to 0.344. When we substitute this value into the previous equation, the conditional probability comes to 0.687. This means that under the assumption of a theoretical normal distribution for margin of victory, we would expect that when an underdog covers, the underdog will win straight up 68.7% of the time.
This number also lies between the empirical estimate here (61.6%) and the estimate mentioned in the original theory (~82%). When pushes are not included, the empirical estimate here is almost equal (65.7%) to the theoretical estimate. Similarly to Section 2.1, as the spread becomes smaller, the larger we would expect this conditional probability to be, since the P(Y > 0) becomes larger when the spread is smaller (i.e. the underdog has a better chance to win when the spread is smaller than when the spread is larger, which raises the numerator of the previous equation and thus the conditional probability being considered in this section).
For example, there were 808 games when the spread was between 0 and 3. The underdog covered 389 times and there were 50 pushes. Of the 389 covers, the underdog won straight up 348 times, meaning that when the underdog covered, the underdog won straight up 89.4% of the time, or 79.3% of the time when the underdog covers or pushes. Compare this to when the spread is at least 7, where there were 862 games. The underdog covered 428 times and pushed 29 times. Of the 428 covers, the underdog won straight up 175 times, meaning that when the underdog covered, the underdog also won straight up 40.9% of the time, or 38.3% of the time when the underdog covers or pushes. This exemplifies the property discussed in the previous paragraph, where the underdogs winning probability P(Y > 0) becomes larger in games with lower spreads (indicating more evenly matched teams) compared to games with higher spreads.
Section 3. Empirical Backtest of Advice
While conditional probability helps explain why these statistics are observed, there still remains the question of how practical these observations are from a betting perspective. For example, can we profit more by betting underdogs straight up (on the moneyline) rather than the spread, since we know that if they cover they are more likely to win straight up? Similarly, do we miss out on profits if we bet the favorite as a moneyline rather than betting them against the spread, since they are more likely to cover if they win?
Using the spreads and moneylines scraped previously, the following games bet, profit, and return on investment can be calculated assuming one unit bet for each game. These numbers are reported in Table 2 below.
Table 2. Comparison of results of betting on outcomes
|Favorite Spread||Outcome Bet||Games Fitting Criteria||Profit||Return on Investment|
|S <= 3||Favorites ATS||808||-52.3||-6.5%|
|3 < S < 7||Favorites ATS||711||-62.8||-8.8%|
|7 <= S||Favorites ATS||862||-57.7||-6.7%|
Each of the systems above is negative, since the systems are naively betting one unit on every game (betting the outcome specified in the “Outcome Bet” column). However, the advice proposed as part of the theory does not hold according to this empirical data, as the favorites ATS is worse than the favorites ML. The second part of the theory also does not hold, as the Underdog MLs are consistently worse than the Underdog ATSs. This second observation comes from the fact that the spread “comes into play” in some games, meaning the Underdog covers but does not win straight up, which is what causes the difference in results between these systems. The spread “comes into play” 16.8% of the time across all spreads, which are games where the Underdog ML loses but the Underdog ATS wins.
We can also see the natural progression of this percentage, which we would expect to increase as the spreads become larger because the underdog is less likely to win straight up in large spread games, but the probability of covering should always be 50%. This is also partially what causes the difference in the system results, since the ML payouts are larger when the spread is larger, but these are precisely the games in which the underdog “needs” the spread more often, showing that these offsetting forces almost balance out. This also means that if a small sample size of games is analyzed, the percents will reflect the magnitude of the spreads in that sample, which may be why Christian’s estimated percentages numbers are higher. Using the same bins from the table above:
- When S <= 3, in 38 of the 808 games (4.7%) the underdog “needs” the spread
- When 3 < S < 7, in 111 of the 711 games (15.6%) the underdog “needs” the spread
- When 7 <= S, in 250 of the 862 games (29.0%) the underdog “needs” the spread
The primary takeaway here should be Section 2, which shows that while the percents proposed in the theory seem large, they are to be expected when considering that this theory is based on conditional probability. The theory’s results were backtested using Sportsbook Review’s lines to the past nine NFL seasons, which found similar empirical results, though less pronounced than the proposed theory. The advice that would stem from the theory was also tested, revealing worse results in terms of profit and ROI. More specifically, betting favorites ATS proved to be worse in profit and ROI across several different spread ranges than betting favorites ML, as did betting underdogs ML against betting underdogs against the spread.
The Monty Hall problem has a similar overall message, but in the betting case we are not “shown the door” until after the game has completed, which means we cannot take advantage of the increase in conditional probability.