My premise begins with every regular-season game played in the NFL since 1978. Why 1978? I’d love to tell you it was because that was the year the modern game truly emerged thanks to the liberalization of passing rules (which, incidentally, is true), but really it was because that was the most convenient dataset I had on hand with which to run this kind of study. Anyway, I took all of those games, and specifically focused on the number of points scored by each team in each game. I also came armed with offensive and defensive team SRS ratings for every season, which give me a good sense of the quality of both the team’s offense and their opponent’s defense in any given matchup.
If you know anything about me, you probably guessed that I want to run a regression here. My dependent variable is going to be the number of points scored by a team in a game, but I can’t just use raw SRS ratings as the independent variables. I need to add them to the league’s average number of points per game during the season in question to account for changing league PPG conditions, lest I falsely attribute some of the variation in scoring to the wrong side of the ball simply due to a change in scoring environment. This means for a given game, I now have the actual number points scored by a team, the number of points they’d be expected to score against an average team according to SRS, and the number of points their opponents would be expected to allow vs. an average team according to SRS.
Time for regression:
tm_points ~ -18.17584 + 0.94601*tm_offquality + 0.92719*op_defquality
The multiple R-squared on that was 0.2677, so (unsurprisingly) there’s a lot of variance in game-to-game scoring that isn’t explained by the season-long qualities of the team’s offense and the opponent’s defense. But among the proportion of variance that is explained, we can use R’s relaimpo package to determine how important each regressor is on scoring, the dependent variable.
Before running relaimpo, you first need to install the packages for MASS, boot, survey, mitools, and corpcor in order to perform the analysis we’re doing. You can do this directly from the R command line or go to cran.r-project.org and find the packages manually.
Once that’s done, you just run the
calc.relimp() function on your regression:
lmg last first betasq pratt genizi car offquality 57.8% 58.2% 57.5% 58.2% 57.8% 57.8% 57.8% defquality 42.2% 41.8% 42.5% 41.8% 42.2% 42.2% 42.2%
Don’t worry about the abbreviations there; they just represent different statistical techniques of determining the relative importance of each predictor in a linear regression. The salient point is that they all agree that roughly 58 percent of the variation in game-to-game point totals is due to variations in team offensive quality, and 42 percent is explained by the defensive quality of the opponent (among the proportion that’s actually explained by the regression, that is).
This is pretty much exactly the same as what Chase found, which shouldn’t be surprising because both studies cover the same range of games and seasons. But it’s nice to look at things from a somewhat different angle and come up with the same result. When two teams face, about 40% more of the outcome will be determined by the respective qualities of the offenses than by the qualities of the defenses.