Andrew Healy, frequent contributor here and at Football Outsiders, is back for another guest post. You can also view all of Andrew’s guest posts at Football Perspective at this link, and follow him on twitter @AndHealy.
For a stats guy, the Wells Report is gripping reading, particularly the appendices provided by the consulting firm Exponent. The conclusion there is pretty simple. Compared to referee Walt Anderson’s pregame measurements, the Patriots’ footballs dropped significantly further in pressure than the Colts’ footballs did. Therefore, even if Tom Brady’s involvement is unclear, a Patriots’ employee probably deflated the balls.
At first glance, that evidence seems pretty convincing, maybe even strong enough to conclude more definitively that tampering occurred. And it is kind of awesome that the officials even created a control group. But there is a problem with making firm conclusions: timing. As Exponent acknowledges, the measured pressure of the balls depends on when the gauging took place. The more time that each football had to adjust to the warmer temperature of the officials’ locker room at halftime, the higher the ball pressure would rise.
And, not surprisingly given the Colts’ accusations, the officials measured the Patriots’ footballs first. This means that the New England footballs must have had less time to warm up than the Indianapolis footballs. Is that time significant? We will get to that, but it does make for a good argument that the Indianapolis footballs are not an adequate control group for the New England footballs. Given the order of events, we would expect the drop of pressure from Anderson’s initial measurements to be lower for the Colts’ balls that had more time indoors at halftime. As the Wells report notes, the likely field temperature was in the 48-50 degree range, compared to the 71-74 degree range for the room where the footballs were measured.
So, how much lower? Here it gets a little fuzzy. The report is clear that the Patriots footballs were gauged first during halftime, but it is unclear about whether the second step was to reinflate the Patriots’ balls or to measure the four Colts’ balls. In Appendix 1 (see p. 2 of the appendix), Exponent notes “although there remains some uncertainty about the exact order and timing of the other two events, it appears likely the reinflation and regauging occurred last.” If events unfolded this way, it would make the Indianapolis footballs at least a better sort of control group.
If, on the other hand, the Colts footballs were measured last, there would have been a significant amount of time between when the last Patriots’ football was measured and when the first Colts’ ball was — i.e., the amount of time it took to reinflate the New England footballs. With a few extra minutes to warm up closer to room temperature, the pressure in the Colts’ footballs could rise towards the levels Anderson measured before the game.
I am not breaking any ground here: Exponent acknowledges this possibility in the scenarios it thoughtfully presents about how natural causes could explain the differences between the Patriots’ and Colts’ footballs (see item 10 on p. XIII of Appendix 1, or the 159th page of the 243 page .pdf). It concludes that the necessary timing conditions are unlikely given the information that was provided to them by Wells’s law firm, but it allows for the possibility. So this seemed to me the most important question after reading the report: Could the observed differences between the Patriots’ and Colts’ footballs come from timing if the Colts’ footballs had actually been measured just before the end of halftime?
To try to answer this question, we actually have a little more information to go on than just the average differences in ball pressure. Specifically, we have the variation in the pressures in the balls. Consider the data from the report on the halftime pressures for the 11 Patriots’ footballs and the four Colts’ footballs that the officials had time to measure. (If you haven’t read the report, the two officials (Clete Blakeman and Dyrol Prioleau) were using different gauges, one consistently reading a little higher than the other. There is one case (Ball 3 for the Colts) where Blakeman’s gauge read lower in the report. The report plausibly argues that the readings may have been transcribed incorrectly, and I switch the readings here accordingly.)
First, notice that the effects on the Indianapolis footballs when moved from a colder to warmer environment were not enough to not bring the Colts’ footballs back up to their pregame level of nearly 13.0 psi. Therefore, absent tampering, I think we should expect roughly similar levels of variation across the balls for each team.1
But it won’t take you long to see that it doesn’t appear that way in the table above. The Patriots’ footballs vary, on Clete Blakeman’s gauge, from 10.5 to 11.85 psi, while the Colts balls vary in a much narrower range from 12.5 to 12.95 psi. On the other hand, there are more Patriots’ balls in the sample, so we’d expect the range to be higher. How about the standard deviation for the Patriots’ footballs compared to the Colts’ footballs?
Under Blakeman’s numbers, the Patriots’ footballs had a standard deviation of .402 psi compared to .165 psi for the Colts’ balls. With Prioleau’s pressures, the standard deviations for the two teams are .410 psi and .144 respectively. So the Patriots balls had a standard deviation of pressure more than double that of the Colts. Given the sample size, these differences are not quite significant. Using Blakeman’s measurements, the p-value is 0.17. With Prioleau’s, it is 0.11.
Still, the difference in variation between the Patriots’ and Colts’ balls is awfully suspicious. That piece of evidence has helped tip me, initially a Deflategate skeptic, into the convinced column. There was a wealth of interesting evidence in the Wells Report. But absent video of the Deflator (somehow neither an action hero nor an incompetent central banker) working his magic, the hard proof had to come from the pressure in those 15 balls tested at halftime. Combined with the difference in average pressure levels, the difference in dispersion is hard to dismiss.2
Moreover, that latter difference fits with the theory of how The Deflator committed the crime. With a short amount of time in that bathroom, he could just quickly release a little pressure from each ball. That process would take balls that were roughly equal in pressure at the time of Anderson’s initial measurements and make them more dispersed.