Best Boxes in a Super Bowl Pool

Posted: February 9th, 2011 | Author: | Filed under: Data Analysis | Tags: , | 17 Comments »

While I am fairly confident I am the only member of the dataists that is a sports fan (Hilary has developed a sports filter for her Twitter feed!), I am certain I am the only America football fan. For those of you like me, this past weekend marked the conclusion of another NFL season and the beginning of the worst stretch of the year for sports (31 days until Selection Sunday for March Madness).

To celebrate the conclusion of the season I—like many others—went to a Super Bowl party. As is the case with many Super Bowl parties, the one attended had a pool for betting on the score after each quarter. For those unfamiliar with the Super Bowl pool, the basic idea is that anyone can participate in the pool by purchasing boxes on a 10×10 grid. This works as follows: when a box is purchased the buyer writes his or her name in any available box. Once all boxes have been filled in, the grid is then labeled 0 through 9 in random order along the horizontal and vertical axis. These numbers correspond to score tallies for both the home and away teams in the game. Below, is an example of such a pool grid from last year’s Super Bowl.

Super Bowl pool example

After the end of each quarter of the game, whichever box corresponds to the trailing digit of the scores for each team wins a portion of the pot in the pool. For example, at the end of the first quarter of Super Bowl XLIV the Colts led the Saints 10 to 0. From the above example, Charles Newkirk would have won the first quarter portion of the pot for having the box on zeroes for both axes. Likewise, the final score of last year’s Super Bowl was New Orleans, 31; Indianapolis, 17. In this case, Mike Taylor would have won the pool for the final score in the above grid with the box for 1 on the Colts and 7 for the Saints.

This weekend, as I watched the pool grid get filled out and the row and column numbers drawn from a hat, I wondered: what combinations give me the highest probability of winning at the end of each quarter? Thankfully, there’s data to help answer this question, and I set out to analyze it. To calculate these probabilities I scraped the box scores from all 45 Super Bowls on Wikipedia, and then created heat maps for the probabilities of winning in each box based on this historical data. Below are the results of this analysis.


Heat Map of Win Probabilties -- First Quater Heat Map of Win Probabilties -- Half Time

First Quarter

Half Time

Heat Map of Win Probabilties -- Third Quarter Heat Map of Win Probabilties -- Final

Third Quarter

Final

The results are an interesting study in Super Bowl scoring as the game progresses. You have the highest chance of winning the first quarter portion of the pool if you have a zero box for either team, and the highest overall chance of winning anything of you have both zeroes. This makes sense, as it is common for one team to go scoreless after the first quarter. After the first quarter, however, you winning chances become significantly diluted.

Into half time having a zero box is good, but having a seven box gives you nearly the same chance of winning. Interestingly, into the third quarter it is best to have a seven box for the home team, while everything else is basically a wash. With the final score, everything is basically a wash, as teams are given more opportunity to score and thus adding variance to the counts for each trailing digit. That said, by a very slight margin having a seven box for either the home or away team provides a better chance of winning the final pool.

So, next year, when you are watching the numbers get assigned to the grid; cross your fingers for either a zero or a seven box. If you happen to draw a two, five, or six, consider your wager all but lost. Incidentally, one could argue that a better analysis would have used all historic NFL scoring. Perhaps, though I think most sports analysts would agree that the Super Bowl is a unique game situation. Better, therefore, to focus only on those games, despite the small-N.

Finally, the process of doing this analysis required mostly heavy-lifting on data wrangling; including, scraping the data from Wikipedia, then slicing and counting the trailing digits of the box scores to produce the probabilities for each cell. For those interested, the code is available on my github repository.

There are two R functions used in this analysis, however, that may be of general interest. First, a function that converts an integer into its Roman Numeral equivalent, which is useful when scraping Wikipedia for Super Bowl data (graciously inspired by the Dive Into Python example).

Second, the function used to perform the web scraping. Note that R’s XML package, and the good fortune that the Wikipedia editors have a uniform format for Super Bowl box scores, makes this process very easy.

R Packages used:


  • stephen

    Hi
    I was interested by the nice heatmap you get from ggplot2. You describe the code thus:

    boxes.heatmap<-ggplot(boxes[1:10,], aes(xmin=x,xmax=x+1,ymin=y,ymax=x+1))+geom_rect(color="white")

    but that doesn't look right to me as there are no 'heat' colors

  • http://twitter.com/james_a_hart James Hart

    The interesting question for me is, in the situation where you are allowed to buy more than one square, can you push the odds in your favour by buying multiple squares in the same column or row, or is the better strategy to make your squares independent?

  • http://blogs.sas.com/iml Rick Wicklin

    It doesn’t matter because both the home and away digits are randomized (see the image of last year’s boxes) .

  • http://blogs.sas.com/iml Rick Wicklin

    It doesn’t matter because both the home and away digits are randomized (see the image of last year’s boxes)

  • Anonymous

    I think you’re better off not buying single rows or columns, because there are more low-probability numbers than there are high ones.

  • mike

    Do you think that, if you had more data, the “Final” plot would be more uniform? Also, and apologies for not knowing anything much about American Football, but what’s important about a 7 score? Is there a good reason for this being a good bet? For example is it a combo of likely events or something (like a converted try + drop goal in rugby)?

  • Anonymous

    The primary means of scoring in football is the “touchdown,” which is worth 6 points, plus the opportunity for an extra point. As such, if a team only scores a single touchdown then they get 7 points. The secondary means of scoring is a field goal, which is worth 3 points.

    As such, scores that are some combination of these are quite common, which happen to often end in 7; particularly 7 (1TD), 17 (1TD +2FG), and 27 (3TD 2FG).

  • http://www.consultingstatistics.org Basil

    I just stumbled across your blog! Awesome post!

  • http://sphaerula.com/wordpress/statistics/super-bowl-squares-pool-probabilities/ Super Bowl Squares Pool Probabilities | Sphaerula

    [...] Conway at dataists performed a similar analysis using the scores of forty-five Super Bowl games. Conway provided the source code for his analysis [...]

  • Mattdemazza

    A guy at work picked 5 Super Bowl squares in the same row. So while
    he’ll have five different numbers for the Giants, he only will have one
    number for the Pats.

    I said that’s a bad strategy, because you’re only giving yourself one chance at nailing the Pats’ score.

    He

    says that’s nonsense because it’s all random and only one square wins
    anyway. (I say that’s true, but not all squares have the same
    probability, as they would if the sport were, say, basketball.
    Obviously, numbers like 4, 7 and 0 are better than 9
    and 2.)

    I liked this example: “Pick two people to guess a number
    between 1 and 10, but give one of them five guesses and the other guy
    only one guess.” That didn’t sway him.

    Thoughts?

  • Justaguest

    Your friend is correct.  However, he is setting himself for more extreme odds after the numbers are drawn.  For example, if his single row is for a 2 then that will give him some of the worst odds on the chart.  However if his single row number is 7 then that will give hi some of the best odds on the chart.

  • http://www.cheapoakleyglasses.co.uk/ cheap oakley glasses

     Thanks for share.

  • http://www.handbagsonsalefr.com/ Burberry Sac à Main

     Nice site! I enjoy several from the articles that have been written, and particularly the comments posted! I am going to definitely be visiting again

  • http://www.bigcholebags.com/ Chole Bags

    The blog article very surprised to me! Your writing is good. In this I learned a lot! Thank you!

  • http://twitter.com/superbowlbox SuperBowl Box

    use superbowlbox.net to create your office pool.

  • Anonymous

    Hey just quick correction here 17 = 2 TD’s (14) and 1 FG(3)

  • Anonymous

    Drew, I created a quick mobile app today to tell me my odds during the game in case I forgot.  This is based on 7 years of NFL scores gathered by http://caseyshead.com/2013-super-bowl-squares-odds/ .  What I was specifically interested in was their “per quarter” data, which I think is very important to realize as you mention above, increased variance in possibilities is related to time, and due to such is an abstraction of the second law of thermodynamics.  The app I put together is pretty rough (I did it in between lunch and the superbowl) http://footballsquares.azurewebsites.net/ and takes your teams single digit score number as input.

    I would like to enhance this next year in my spare time, so I would love to gather some ideas.  Somethings I am thinking about doing are adding support for multiple squares, calculating total risk vs. potential reward, and incorporating real-time scores into the calculations