## Bayes Theorem and the New York Giants

The New York Giants are now 0-6. There are many reasons for the team’s struggles: questionable drafting, injuries, Eli Manning interceptions, injuries, coaching mistakes by Tom Coughlin, and injuries. But let’s say you have a buddy who is convinced that the Giants are not that bad: in fact, he thinks New York is just a .500 team that has been really unlucky.

Your first inclination might be to stop being friends with this person, but after that, you might wonder: “Hey, how likely is it for a .500 team to start off 0-6?” This is the same (ignoring strength of schedule, the fact that games are not independent, and several other variables) as asking the question “how likely is a coin to land on heads six times in a row?” The answer to both questions is pretty simple: 0.500^6, or 1.56%. Using the binomial distribution (in Excel, this would involve typing =BINOM.DIST(0,6,0.5,TRUE) into a cell) — which assumes that the talent level of NFL teams is normally distributed, an assumption I will make throughout this post — would give you the same result of 1.56%.

That answer is simple, but it actually answers a different question. What you want to know is the likelihood that the Giants are actually a .500 or better team. It’s a minor but crucial distinction: what we just determined was the likelihood that, given the assumption that the Giants are a .500 team, that they would start 0-6. To address the question of how likely the 2013 Giants are actually a .500 (or better) team despite the 0-6 start, we need to use Bayes Theorem.

Much of the math involved in this process is frankly over my head, but fortunately, Kincaid over at 3-D baseball already did much of the work (and thanks to Neil for giving me that link). I will be blatantly copying his article (with the only changes being stylistic and making this for, you know, football), so make sure to give him all the credit he deserves. It’s a fantastic piece that has many useful applications.

What we want to find out is the probability of A (the Giants being a .500 or better team) given B (the Giants going 0-6).  This can be stated as:

P(A|B) = P(B|A)*P(A)/P(B)

To determine the probability of A given B (which is what P(A|B) means in English), we need to find the probability of B given A, multiply that by the probability A, and divide by the probability of B.

Let’s start with the probability of A, which is simply the odds that a random NFL team (in this case, the Giants) are a .500 or better team.  That’s easy: it’s 0.500. In this step, we’re not looking at the results: we just want to know how likely the team we pulled from our prior distribution is actually a .500 (or better) team.  In truth, I’m taking the easy way out here by picking .500; if we want answer the question of how likely a 9-7 team (the Giants record last year and their wins total according to Vegas in the pre-season) is to start 0-6, that’s a slightly more complicated question.  This post is complicated enough, so we’ll skip that part for now.  In conclusion, P(A)=.5.

The more interesting question is determining the probability of B, i.e., the probability that we observe a random team drawn from our prior distribution lose its first six games. This is simple to calculate for any one team if we know their true winning percentage — after all, we did that in the beginning of this post — but we need to know the average weighted probability for all possible teams we could draw from our prior distribution. We need an average weighted probability because if a team goes 0-6, they’re much more likely to be a bad team than a good team. Doing this will require a bit of calculus, but fortunately there are online calculators that can do all the heavy lifting.

For any one team with a known true win% (p), the probability of going 0-6 is:

( G! / (W! * L!) ) * p^W * (1-p)^L

where G represents the number of total games, W the number of wins, and L the number of losses. This isn’t as scary as it looks: if we make G = 6, W = 0, and L = 6, and want to see the likelihood of a team going 0-6 as p (the known true win%) ranges from 0.25 to 0.75, we get the following graph:

The red dot is at p = 0.500, which shows the Giants at a 1.56% chance.   But the key isn’t finding the odds of each p, it’s finding the average value of that formula across the entire prior distribution. To do this, we utilize the same principle as a weighted average. Here, the weight given to each possible value of p is represented by the probability density function of the prior distribution. As you can see above, the odds of a .300 team losing six straight games is much higher — about 11.8% — and that number gets weighted by the quantity f(p), where f(p) is the probability density function or our prior normal distribution. We repeat this for each possible value of p, add up the weighted terms, and then divide by the sum of the weights. This means taking the definite integral (from p=0 to p=1) of the probabilities weighted by f(p), which we can do using an online definite integral calculator. You can copy and paste this into the function field if you want to try it for yourself:

((6!/(0!*6!))*exp((-((x-.5)^2))/(2*.0225))*(x^0*(1-x)^6)/(sqrt(2*3.14159*.0225)))

or, more generally, if you want to play around with different records or different means and variances for the prior normal distribution:

(((W+L)!/(W!*L!))*exp((-((x-u)^2))/(2*VAR))*(x^W*(1-x)^L)/(sqrt(2*Pi*VAR)))

Note: this formula has one variable in there I haven’t discussed, which is the variance (represented by VAR in the equation). I used 0.0225 as the variance, and you can read why in this footnote.1 Anyway, integrating the above equation from 0 to 1, we get a total value of .042. We still have to divide by the total sum of the weights to find the weighted average, but that is equal to one by definition (it is just F(p), or the cumulative distribution function of the prior distribution). Therefore, P(B) = .042.

Finally, we have to calculate P(B|A). This is the probability of observing an 0-6 record, given that we are drawing a random team from the prior distribution that fulfills condition A, which is that the team has at least a .500 true winning percentage. This is done very similarly to finding P(B) above, except we are only considering values of p > 0.500.

Start by calculating the same definite integral as before, but from .5 to 1 instead of from 0 to 1 (this is done by simply changing text box next to “Lower limit” below the formula). This gives a value of .00261. That is the weighted sum of all the probabilities; to turn the weighted sum into an average, we still have to divide by the sum of all the weights, which in this case is .5. Dividing .002612 by .5 gives us .005226. This is P(B|A).

Hey, we’re almost done! Now we have:

P(A)= .5 (this is the probability of randomly selecting a team that is average or better)
P(B) = .042 (this is the probability that we observe a random team drawn from our prior distribution lose its first six games)

P(B|A) = 0.005226 (the probability of B, given A)

P(A|B) = .005226 * .5 /.0420 = 6.2%

So the probability of a team that goes 0-6 being at least a .500 team in terms of true ability is about six percent (assuming our prior distribution is fairly accurate), although it may even be as high as 27.8 percent2. Once again, kudos to the outstanding Kincaid for doing all the heavy and medium lifting, so even I could figure it out. And as he pointed out, one of the interesting results is that this result — 6.2% — is quite a bit larger than the 1.56% result you would get from simple math, which was essentially just a hypothesis test. In his words:

This is why it is important not to misconstrue the meaning of the hypothesis test. The hypothesis test only tells you a very specific thing, which is how likely or unlikely the observed result is if you assume the null hypothesis to be true. Rejecting the null hypothesis on this basis does not necessarily mean the null hypothesis is unlikely; that depends on the prior distribution of possible hypotheses that exist. Considering potential prior distributions allows us to make more relevant estimates and conclusions about the likelihood of the null hypothesis.

But we’re not done, so allow me to copy some more of his work. Another advantage of the Bayesian approach is that it gives us a full posterior distribution of possible results. For example, when we observe an 0-6 team, we can not only estimate the odds that it is a true .500 team, but also the odds that it is a true .550 team, or .600 team, or whatever. And since we have a full distribution of likelihoods, we can also figure out the expected value.

The posterior distribution of possible true talent W% for a team that is observed to go 2-10 is represented by the product we integrated earlier:

((6!/(0!*6!))*exp((-((x-.5)^2))/(2*.0225))*(x^0*(1-x)^6)/(sqrt(2*3.14159*.0225)))

We find the expected value of that function the same way we found the average value of the above probabilities. This time, we want to find the average value of x (or p, as we were calling it before), so we weight each value of x by the above function, and then divide by the sum of the weights. For the numerator, this means integrating the above function multiplied by x. To do that, just copy and paste this into the calculator:

x * ((6!/(0!*6!))*exp((-((x-.5)^2))/(2*.0225))*(x^0*(1-x)^6)/(sqrt(2*3.14159*.0225)))

The denominator is just the sum of the function not multiplied by x, which we already did above (i.e., 0.042).

Plugging this into the definite integral calculator, we get:

0.01279096/0.04203334 = 0.304

So a team that goes 0-6 games will be, on average, about a .304 team (again, assuming our prior distribution is accurate). That’s not far from the answer you would get if you added 5.5 wins and 5.5 losses to the Giants’ record, which is Neil’s preferred method of regressing teams to the mean. A team that was 5.5-11.5 would have a 0.324 winning percentage.

Some additional notes, which, like everything else in this post, comes courtesy of Kincaid. This analysis assumes a prior that presumes nothing about the team in question other than the fact that it comes from a distribution of teams roughly like that we observe in the NFL. What if, in addition to knowing the 0-6 team plays in the NFL, we also know that the team has Manning, Victor Cruz, Hakeem Nicks, Jason Pierre-Paul, and is expected to be one of the better teams in the league? We can adjust the prior for that information as well. Our prior distribution, after accounting for the amount of talent we believe to be on the team, might have a mean record of 9-7.

This would give us:

p(A) = 0.338 (in Excel, this is simply “=1-NORM.DIST(9/16,0.5,0.15,TRUE)” — this means only 33.8% of teams have true ability levels of 9-7 or better

p(B) = 0.023797

p(B|A) = 0.0024

Now we get an expected winning percentage for the rest of the season of .349; in other words, we would put a new over/under on number of wins for the Giants this year at 3.5 wins. Of course, this all depends on what prior you assume (and we’ve ignored injuries and strength of schedule), but as long as you make reasonable assumptions, this process should provide a reasonable estimate.

1. From 2002 to 2012, the standard deviation of team winning percentages was 0.192. However, there’s a lot of noise in team winning percentages. The standard deviation in Pythagorean winning percentages was 0.173 over that same time period. There’s still some noise in Pythagorean winning percentage, too. What we want to use here is the spread of true talent — not observed production — in the NFL, which is certainly smaller than 0.192 and I would presume to be a bit smaller than 0.173, too. I’ll use 0.15, which is a somewhat round number, to estimate what the true standard deviation is for NFL teams. The variance is simply the square of the standard deviation, which is why I used 0.0225. Note that Kincaid stated that for baseball, the true standard deviation was 0.05, a much lower number. That may be appropriate for football, too, but a full discussion is outside the scope of this post. []
2. If you used a standard deviation for NFL team winning percentage of 0.05 instead of 0.15, that would be the result. In other words, the more tightly packed the teams are, the more likely it is that a good team is the one actually producing the 0-6 record []
• Richie

“Bayes Theorem”? You are giving me flashbacks to flunking out of college. (I eventually went back and graduated.)

• Chase Stuart

Sorry, Richie. Unfortunately, some problems can only be/are best solved by using Bayes Theorem. I’ve looked around, tho, and don’t see an easy way to do it in Excel.

• Richie

Chase, how easy would it be for you to see how many teams that finished the season .500 (or maybe include 9-7 and 7-9 teams), started the season 0-6. I’m curious if it might be close to that 1.56%.

The ’01 Redskins only started 0-5. That’s the only team that jumps to mind as starting off terrible and finishing close to .500.

• Chase Stuart

Since 1950, 1 out of the 68 teams to start 0-6 has finished 8-8 or better –> the ’09 Titans. But remember, that’s a very different answer than the question you’re asking.

Seven of the 68 teams went .500 or better over the rest of the year. That’s 10.3%, which is squarely within the reasonable estimates from this post.

• Richie

Yeah, your 1 of 68 is different from what I was asking. Also, what I was asking is not necessarily part of your opening calculation. Just because a team finishes at (or within one game of) .500 doesn’t necessarily mean they are truly a .500-quality team.

But I find it interesting (mostly coincidental) that 1 out of 68 is 1.47%; pretty close to 1.56%!

• nottom

The problem here is that even if they are truly a .500 team, their expected result after starting 0-6 is 5-11 not 8-8. Going 8-8 would require them to win 80% of their future games.

• Great stuff, Chase! Note that the distribution of true wpct talent in the NFL can be derived via the process in this Tangotiger post:

If you plug that exact stdev into Bayes, you would get *precisely* the same expected true wpct (assuming a .500 prior) as the full version of the “add 11 games of .500 ball” method — it’s just another way of going thru the process you/Kincaid did above. Math is pretty neat. 🙂

• Chase Stuart

Well, if Neil approves of the post, I know I’m doing something right. I think the true spread is something I want to discuss, but not this morning!

• Chase Stuart

Actually, you also wrote about that here: http://www.footballperspective.com/estimating-nfl-win-probabilities-for-matchups-between-teams-of-various-records/

So you agree that 0.15 is the standard deviation for true talent in the NFL? If so, do you disagree with the Kincaid presumption (which he did not explain the origin of) that in baseball, it’s 0.05?

• It’s different depending on the sport, based on the spread of records. So I agree with both — in the NFL, it’s roughly0.15; in MLB, it’s around 0.05. This makes sense, because think about the different records you see in each sport… NFL teams routinely go 12-4 or 13-3, which in baseball would be something like 120-130 wins by WPct (something you never ever see). So it’s intuitive that the spread of true MLB WPct talent would be more narrow than the spread of NFL WPct talent, not accounting for the length of each sport’s schedule.

• Chase Stuart

I suppose. I guess I just assumed that it was all due to length of schedule, and it’s never odd for a great baseball team to win 13 of 16 games. I suppose the QB plays a part in this. Any other thoughts why the spread is so much narrower in MLB?

• James

I think starting pitching has a lot to do with it. In football you’re more or less starting the same team all 16 games, while baseball teams start ~6 different “QBs” over the course of the season, and nobody’s 6th starting pitcher is any good, so the true talent of the team fluctuates wildly between games and are all dragged down towards average. This is further exacerbated because the playoff format requires only 4 starting pitchers, so teams are discouraged from having deep starting pitching.

• I agree with James on the impact of SPs. Also, I think baseball is just a lot more random. Football has more moving parts, but most of the player-opponent interactions are purely a matter of physical strength and/or mental execution, neither of which really invites a lot of randomness relative to baseball, where the central interaction involves hitting a round ball with a round bat and hoping it falls somewhere in between the fielders. It just seems like the entire sport of baseball is much more luck-driven than football.

• James

You saying randomness reminded me that having nine 3-out innings makes baseball more random than if it had one 27-out inning. Normally, other than turning over the line-up, any baserunner that doesn’t score or drive in a run is essentially wasted when the inning ends, but if innings continued for 27 outs teams with higher on-base percentages would dominate the competition because base runners would score more often and those teams would get more runners on base.

In that sense, football is like a two inning baseball game with lots of outs because every yard a team gains has a lasting impact on field position even if the offense doesn’t score that particular drive. It would make football much more random if every drive started on your own 20 yardline, which is what close to what happens at the start of every inning in baseball.

• That’s an awesome, awesome point. Very well said.

• Chase Stuart

That is a very good point.

• George

Interesting one – I’m just trying to get my head around Bayes Theorem a bit as I’m starting to read chunks of the Nate Silver book and it makes lots of reference to it in there. Just for fun, I’ve put together a simulation in Excel so that I can play the season out at any point just by plugging my latest set of ratings in, and someone’s schedule and results. Taking the ratings through the Thursday night game (which I would say are in the 80% accurate mark range as the standard deviation of the error is low – some teams are good, some teams I’d say are wrong with just one more big adjustment needed) this is the numbers I have for how the rest of the Giants season plays out (based on 10,000 sims with the percentage numbers being +/- 1% over 10,000 sims):

WINS, %
0, 10.6
1, 8.45
2, 1.62
3, 19.76
4, 7.45
5, 1.83
6, 0.24
7, 0.04
8, 0.01
AVG WINS 1.9172

There doesn’t seem to be any consistency of who the wins come against, but the major flaw with this one is the assumption that the Giants are a 12 points worse than average team for the rest of the year (or about a field goal per game better than Jacksonville were for the whole year which I am assuming that we don’t believe to be the case?).

• Chase Stuart

Thanks, George. Can you make that spreadsheet publicly available and/or e-mail it to me?

• George

Not a problem. I’ll add some comments into it so it will be obvious where I’m coming from (or what needs to be pasted where etc.). It may take a couple of days but I’ll send it over to you during this week.

• George

Sorry just noted 1 win should have been 28.45%, and 2 wins should have been 31.62% – didn’t copy the numbers over from Excel very well.

• Danish

Great stuff chase/Kincaid.

Re: stdev discussion. I wonder what the variance is for DVOA. That measure at least tries to get rid of some of the noise. That of course touches on the is DVOA descriptive or predictive-debate… I wonder what Schatz actually use for his predictions…

• James

My understanding is DVOA is over-fit and therefore despite what FO says it’s mostly descriptive, but obviously we’ll never know since it’s a black box.

Brian Burke over at advancednflstats.com has predictive rankings, and according to those the Giants are close to a 0.500 team. The predictive rankings took the measures that have a big impact on wins AND are consistent throughout the season, so things like red zone performance and turnovers are heavily regressed as those have a big impact but are not consistent going forward.

• Danish

“Hey, we’re almost done! Now we have:

P(A)= .5 (this is the probability of randomly selecting a team that is average or better)
P(B) = .042 (this is the probability that we observe a random team drawn from our prior distribution lose its first six games)

P(B|A) = 0.005 (the probability of B, given A)

P(A|B) = .009*.5/.02 = 6.2%”

There’s gotta be something wrong here, right. Most likely the last line: where .009 and .02 comes from I don’t know, but i doesn’t equal .062. If you use the other values you get 0.0595 which i suppose is =6.2% + some rounding error.

• Chase Stuart

Hey Danish — it was part rounding error and part confusing writing on my part. I’ve added some extra digits and I think explained the formula better now. It should get you the 6.2% (as you using the numbers from the calculator link).

• Mike

Great post and work.

One small thought: Should P(A) be greater than 0.50?

In other words, the odds of finishing with record in this set {8-8, 9-7, … 16-0} is higher than 50%, since we’re including the midpoint 8-8. I suppose this is really an issue between discrete vs. continuous , and whether the time period is one season or infinity. (Large-number example: The odds of finishing with record in this set (10,000-10,000, 10,001-9,999, …, 20,000-0) is 0.50)

But in the long term, in a parity-driven league, aren’t most franchises at a 0.50 winning percentage, with a few above and below 0.50? If this is true, P(A) would be higher than 0.50, as the mode at 0.500 is included plus the exceptional teams.

• Chase Stuart

Yes, I think the issue is discrete vs. continuous. I’m happy to stipulate that half the teams in every professional league are .500 or better teams, and half are .500 or worse.

• You’re ignoring the fact that some of those losses are to bad teams (eagles, bears) and they’ve been obliterated in the others. Which lowers their posterior winning percent a lot.

• Chase Stuart

Yes, this analysis ignores SOS and margin of victory, although I’m not sure that the Giants are outliers in that regard. They’ve also lost to two undefeated teams, and lost some close games. I don’t think the Giants have played much better or worse than your typical 0-6 team.

• Ummm…

OK, there’s a serious logical flaw here.

“What is the probability of an 0-6 team being better than that” is the same as saying “What is the probability of X being X’ “, which is obviously 0 – you cannot be X and X’

Then you say “So the probability of a team that goes 0-6 being at least a .500 team in terms of true ability is about six percent” – what is “true ability”? Well, true ability is surely measured ability, and an 0-6 team is an 0.000 team.

I think what you are trying to say is something quite different, which is “what is the probability of the Giants finishing .500 or better”, and if that’s the case – wordsmithing required.

• Salur

I think you mean “X and not X”.

And “true ability is surely measured ability” is just not true. That assumes that there’s no such thing as luck, or even variation in how well a team plays compared to its true talent level.

Records would be accurate if every team always played at exactly their true talent level and the better team always won the game. And if no one got injured, I suppose. None of these things are the case, and that should be pretty obvious.

• Chase Stuart

Thanks for stopping by, Ummm. I’m not following what you’re saying, though; can you try again?

• Independent George

I love your articles, Chase, but I think your entire thesis has been better summarized here.

• Andrew

Thanks for the article Chase. For anyone who wants to play with the idea a little bit more, I threw together an interactive applet. It allows you to enter the observed record (such as 0-6), and toggle the two options mentioned in the article– your prior belief of how many games the team would win on average, and the cutoff you want to test (ie are they a true .500/.550/.600 team?). It then spits out the posterior distribution, along with the probability of your team being at least as good as your cutoff and the expected winning percentage for the rest of the season.

The coding was rather hasty, so if there’s an error or it’s missing a helpful feature, please let me know.

• Chase Stuart

Wow!