≡ Menu

Projecting Team Wins Using DVOA

For a decade, Football Outsiders has been using advanced analytics to measure and predict team performance. And since the Football Outsiders database now goes back to 1989, I thought it would be worthwhile to test the predictive power of Football Outsiders’ ratings.

If you’re not familiar, FO uses DVOA as its base measure of team strength. The goal here is to use DVOA ratings in Year N to predict win totals in Year N+1. Now, what expectations should we have for DVOA? The fact that the team with the best DVOA in history — Washington in 1991 — won only 9 games the following season is not a knock on DVOA. That was an outstanding Super Bowl team that declined significantly the following year. Ditto the 16-0 Patriots looking less impressive without Tom Brady in 2008. But at a minimum, DVOA must do better at predicting future wins than say, just wins. And it should also do better than Pythagenpat ratings, which only incorporate points scored and points allowed. So does it?

Let’s start with the basics. The best-fit formula1 to project wins in Year N+1 using *only* wins in Year N is:

5.343 + 0.332 * Year N Wins (Correlation Coefficient: 0.32)

And, as shown last week, by using Pythagenpat wins, we get a correlation coefficient of 0.36. So what happens if we instead use Year N DVOA as our input? We get the following best-fit formula:

8.01 + 6.378 * DVOA (Correlation Coefficient: 0.39)

As a result, DVOA does beat both regular wins and the Pythagenpat ratings. Now, what if we use both DVOA ratings and number of wins to predict future wins? As it turns out, the wins variable was nowhere near significant (p = 0.61), which means once we know the DVOA ratings, knowing the number of wins adds no predictive power. In other words, the evidence doesn’t prove that a team with a lot of wins but an average DVOA rating is better than a team with an average number of wins and an average DVOA rating.

But can we improve on DVOA? What if instead of using Team DVOA as our input, we use Offensive DVOA, Defensive DVOA, and Special Teams DVOA? Team DVOA obviously incorporates all three of these elements, but perhaps analyzing team strength on a more granular level will tell us more about the appropriate weights. Keeping in mind that for defenses, a negative DVOA grade means an above-average defense, here is the best-fit formula to predict future wins with those three inputs:

8.01 + 6.779 * OFFDVOA – 5.642 * DEFDVOA + 6.518 * STDVOA

All four variables are statistically significant, although as you might suspect the special teams one is the least statistically significant (at only p = 0.024; the other three have infinitely low p-values). As it turns out, this gives us the same correlation coefficient of 0.39, but I prefer looking at teams using this formula. Also, there’s something else to keep in mind when looking at the weights on the coefficients. In generally, range of DVOA grades is wider for offenses (removing outliers, about -35% to 35%) than it is for defenses (about -30% to 25%) or special teams (-8% to 10%). So even though the weight on the special teams variable is larger than the weight on the defense variable, this doesn’t mean special teams is more important than defense.

Continuing the momentary diversion, the standard deviation of DVOA grades from 1989 to 20122, was 14.5% for offense, 10.4% for defense, and 3.9% for special teams. From those numbers, one could put forth the argument that offense is roughly 50% of the game, defense is about 36% of the game, and special teams is around 14%.

What if we instead break down DVOA into five parts: Pass DVOA, Rush DVOA, Pass Defense DVOA, Rush Defense DVOA, and Special Teams DVOA?3 The correlation coefficient does not change, but all variables wind up being statistically significant. The best-fit formula is:

7.64 + 3.069 * PASS OFF + 4.297 * RUSH OFF – 2.231 * PASS DEF – 3.990 * RUSH DEF + 6.368 * ST DVOA

The intercept drops to 7.64 because, on average, passing offenses have above-average ratings compared to the baseline FO is using. This is not the case with offenses as a whole, as team offensive DVOA had an average of zero throughout the period. That difference is due to the fact that false starts and delays of game are counted against the offense, but neither the pass offense nor the rush offense.4 The standard deviations for pass offense, rush offense, pass defense, and rush defense, are 22.0%, 11.1%, 15.1%, and 9.4%. Add in the 3.9% for special teams, and here’s another potential conclusion: pass offense is 33% of the game, rush offense is 17%, pas defense is 22%, rush defense is 14%, and special teams remains at 14%. Those numbers sound and feel appropriate to me.

So what does this mean for 2013? We can use the formula5 and the 2013 DVOA grades to project the number of wins for each team in 2014. Note that all the numbers for the five team grade columns should really be represented with a % sign, but including non-numeric data prevents the user from having the ability to sort the table.

RkTeamPass ORush OPass DRush DSTProj 2014 Wins

It’s important not to infer too much from tables like these. The only inputs are 2013 DVOA grades, and the formula doesn’t know that Robert Griffin III should be a lot better this year or about Houston’s draft picks, Oakland’s cap room, or Green Bay getting twice as much Aaron Rodgers in 2014. But what’s interesting to me is the teams that stand out as different from their Pythagenpat ratings. Here are some thoughts:

The Eagles are projected for nearly one full win more using the DVOA projection (9.5) than Pythagenpat (8.6). That makes some sense, I think, because Philadelphia had excellent offensive pass and rush DVOA grades, and the below-average special teams grade doesn’t mean much. Philadelphia did rank 4th in points, but I think their DVOA grades are farther from the mean than their points scored number indicates.

The Jets are a less intuitive example. I think part of the reason for optimism here is that New York forced 18 fumbles but recovered only two of them! That’s an absurd result, and one that would make the Jets look much better in DVOA grades than points differential. Also, while pass defense is more important than run defense generally, perhaps it’s not as predictive: the regression has a significant weight on rush defense, where New York excels. That’s why Football Outsiders has the Jets as nearly an 8-win team, compared to the 7-win team from Pythagenpat.

The Buccaneers are another team that is projected for nearly one more full win (7.7 vs. 6.8) in 2014. Tampa Bay was 30th in points scored and 32nd in NY/A, but 22nd in Football Outsiders’ pass DVOA. I’m not sure the reason for the discrepancy — perhaps a higher weight on completion percentage? — but it does mean FO is more optimistic on the Bucs passing game, and by extension, the team. But perhaps the biggest reason is because of strength of schedule. Tampa’s schedule was the hardest in the league at 3.6 points tougher than average according to the Simple Rating System, and Football Outsiders also graded the Bucs as having the hardest schedule in the league.

Those three teams stand out as the biggest beneficiaries when using Football Outsiders’ analysis as opposed to just straight points differential. There’s no team significantly harmed by the analysis, although the Bengals, Colts, and Jaguars come closest. The Bengals lose about two-thirds of a win, and I suspect it’s because of the Andy Dalton effect. Cincinnati ranked 6th in points scored but just 12th in pass offense DVOA and 20th in rush offense DVOA. In other words, the Football Outsiders analysis is not nearly as high on the Bengals offense, which would reduce their expected wins total in 2014. The Bengals also had the 8th easiest schedule in 2013 by DVOA.

Anyway, I think these are a pretty useful starting point for your 2014 team projections rather than say, last year’s standings. Even Football Outsiders won’t use these for more than a starting point — their preseason projections will have the customary tweaks for things like teams getting new quarterbacks, injuries (or the lack thereof) in 2013, rookies, offensive line continuity, etc. Everyone will handle those questions differently, but I do think the table above presents a nice base for everyone’s team projections.

  1. Over the period 1989 to 2012, excluding the 1994, 1998, and 2001 seasons. []
  2. Excluding ’94, ’98, and ’01 []
  3. I’ll leave it to someone with more time or inclination to break down the relationship between kicking, kickoffs, punting, kickoff returns, and punt returns re: special teams data. []
  4. As explained to me via e-mail by Aaron Schatz, the average of passing offense and run offense will always be higher than the average of pass defense and run defense because the offensive ratings account of things like false start and delay of game penalties, which are all negative. They’re also not included in defensive rating. So when running my regression, this means I’ve basically given teams a free pass when it comes to false start and delay of game penalties. I’m okay with that, but wanted to make the reader aware of this issue. []
  5. With one adjustment. When I ran the numbers, the average team was only winning 7.944 games, so I increased the constant from 7.642 to 7.697. []
  • Nick Bradley

    Cool. Did you test anything else, like net expected points added?

    I have some issues with DVOA as well because it contains no leverage index, so a team that goes conservative when playing with the lead, or playing prevent defense, is penalized. Denver basically never went into conservative offense mode, and their odvoa was inflated as a result.

  • Duff Soviet Union

    “I have some issues with DVOA as well because it contains no leverage index, so a team that goes conservative when playing with the lead, or playing prevent defense, is penalized. ”

    I’m pretty sure that they, along with numerous others, have tried to taper down “garbage time” stats for no improvement in accuracy.

    Maybe that’s a sign that playing aggressively, even with a lead, is a good strategy on both offense and defense and that running the ball three times into the line then punting on offense and playing passive, prevent defense is just a losing strategy long term.

    Seattle have stood out in the last couple of years as an example of a team that just throttles an opponent when they get them down and I don’t think I’d be calling them overrated.

    • Anders

      It makes sense, there is a reason why prevent defense is often called “prevent from winning defense”

    • They have indeed. It’s also really untrue that there is no adjustment for garbage time, since “garbage time” stats are adjusted to the average of other “garbage time” stats. A good team playing very conservatively in “garbage time” should be better than a bad team playing very conservatively in “garbage time.” It’s just a special pleading (“I neglected to mention that my dragon is invisible” again) people make to defend their own preconceived notions against DVOA.

      • sn0mm1s

        1) FO hasn’t updated their DVOA/DYAR definition in years. So not only is it a black box, what was released at one point isn’t valid.
        2) DYAR/DVOA doesn’t pass the eyeball test in many situations because it devalues long plays. It works well for team predictions but when used to rank individual players some results are laughable (unless you really think that an average backup would lead the league in rushing if he got the same carries that Peterson did in 2012).

        • Duff Soviet Union

          “DYAR/DVOA doesn’t pass the eyeball test in many situations because it devalues long plays. It works well for team predictions but when used to rank individual players some results are laughable (unless you really think that an average backup would lead the league in rushing if he got the same carries that Peterson did in 2012).”

          Where the heck are you getting that from? 2012 Adrian Peterson finished 2nd in the league in rushing DVOA (their per play metric) behind only CJ Spiller and led the league easily in DYAR (counting stat). They are absolutely not saying that an “average backup” would lead the league in rushing if he got the same carries.

          I think their individual quarterback and running back rankings are fine, albeit with the same obvious problems that any individual measure of performance has. Every week there’s teeth gnashing when some backup with 5 carries and 2 receptions has more DYAR than a guy with 100 yards, but that’s because most people don’t get the simple reality that the majority of rushing plays have negative value. They do a very good job of adjusting for opposition so e.g Matt Ryan looks much better this season than he does by conventional stats due to playing a brutal schedule.

          Their receiving rankings are far more problematic basically because they penalise guys for incompletions on targets, when Chase and others have pointed out that targets are generally an indicator of quality.

          And their team rankings are generally really good. They spit out some strange rankings frequently but more often than not they do a much better job of predicting future wins than just looking at the standings and extrapolating.

          • sn0mm1s

            Actually they are. And before you say I don’t know what the stat represents I have email from Aaron himself (the guy that developed the stat in the first place) confirming.

            DYAR – is a counting stat and it represents yards above replacement. Actual Rushing yards – DYAR = expected actual yards of replacement. So they would expect AD’s replacement to rush for over 1600 yards.

        • “Doesn’t pass the eyeball test”=My dragon is invisible.

          • sn0mm1s

            You really think a replacement level RB would lead the league in rushing for the Vikes in 2012? If you have ever read any of FO’s early articles they mention reworking methodologies because their results don’t seem quite right.

            • You just said that if DVOA’s results don’t square with what you think that means that they are wrong. That’s what I’m saying is wrong, because it is horrendously flawed logic. I talked about garbage time in team DVOA and that’s all I’m talking about. I have not said a single word about Adrian Peterson until this sentence.

              • sn0mm1s

                And I said it was great for team predictions but was poor judging individuals. That isn’t flawed logic – it is how these systems are incrementally improved. Outliers and weird results are examined and addressed. Here are the facts: DYAR/DVOA devalues long plays. This doesn’t appear to affect team rankings all that much (of course we can’t try to improve on the system because FO keeps their special sauce secret) but leads to poor results when applied to individuals – especially individuals that are gifted enough to break long plays. How else, other than an eyeball test, can you judge their system when applied to individuals if they don’t publish how they arrive at the numbers in the first place? It isn’t like their data and process is subject to peer review.

                • I know what you said. You also said it replying to me talking about ONLY team numbers. When you say “DYAR/DVOA doesn’t pass the eyeball test in many situations” in response to a comment that ONLY talks about team predictions, you are either (a) suggesting that your complaint applies to team predictions or (b) fundamentally failing to understand the purpose of the “reply” button.

                  And again, I said nothing about individual DVOA.

                  • sn0mm1s

                    When I follow up my critique with an explanation – and you still don’t get that I am focusing on the shortcomings of DYAR/DVOA applied to individual – then it is you that lacks reading comprehension.

  • James

    ” The best-fit formula1 to project wins in Year N+1 using *only* wins in Year N+1 is”

    Typo – that should be Year N, not Year N+1 at the end.

    • Chase Stuart

      Thanks; fixed.

  • Kibbles

    The most interesting takeaway for me is that special teams is as important as rush defense in terms of winning football games. I would suspect that teams spend far more resources stopping the run than on their kicking and returning units, which opens up the possibility of an arbitrage opportunity. Assuming Chip Kelly isn’t already on it, of course.

    • James

      I was surprised by that too, although I don’t know what teams could do about it. Their rush defense has to be able to double as a pass defense so they’re already committing a lot of resources there, and special teams is split up over punt kick, punt return, kickoff, kick return, field goal kick and field goal block units.

      If we split up special teams importance evenly over those 6 units, then rush defense is 6 times as important as the kickoff team, and that seems to match up with where the resources go.

      • Kibbles

        I would think of something like the Freeney/Mathis Colts, where the defensive line was indifferent to the run and they relied on linebackers and safeties (mostly Bob Sanders) to clean up the mess.

        More concretely, it would probably suggest that instead of spending a couple million on a 2-down linebacker (say, Joe Mays and Keith Brooking for Denver), the team would be better off grabbing whatever minimum-salary vet they could get and using the savings to provide some continuity on the return units (which are typically just made up of whatever cheap guys at the end of the roster are deemed expendable and therefore experience a lot of year-to-year turnover). Or instead of paying money for a decent 2-way safety, just grab a 1-way safety and use the savings to get yourself a better kickoff specialist.

  • Mike

    Are there any issues with serial correlation in this regression? I didn’t notice any mention of it.

  • Trepur

    “In generally, range of DVOA grades is wider for offenses (removing outliers, about -35% to 35%) than it is for defenses (about -30% to 25%) or special teams (-8% to 10%). So even though the weight on the special teams variable is larger than the weight on the defense variable, this doesn’t mean special teams is more important than defense.”
    This is why I’m an advocate of converting data into standard deviations before running a regression.

    That way the formula gives me the correlation coefficient, so you can instantly see how much each variable impacts the model.