One of the very first posts at Football Perspective measured how various passing stats were correlated with wins. One of the main conclusions from that post was that passer rating, because of its heavy emphasis on completion percentage and interception rate, was not the ideal way to measure quarterback play. But what about ESPN’s Total QBR, a statistic invented specifically to improve on — and supersede — traditional passer rating?

As a reminder, we can’t simply correlate a statistic with wins to determine the utility of that metric. The simplest way to remember this is that 4th quarter kneeldowns are highly correlated with wins. Just because you notice it’s raining when the ground is wet doesn’t mean a wet ground causes rain; i.e., just because two variables are correlated doesn’t mean variable A leads to variable B (alternatively, variable B could lead to variable A, variable C could lead to both variable A and B, or the sample size could be too small to determine any legitimate causal relationship). That said, it at least makes sense to *begin* with a look at how various statistics have correlate with wins.

**The Sample Set**

Throughout this post, I will be looking at a set of quarterback data consisting of the 152 quarterback seasons from 2006 to 2013 where the player had at least 14 games with 20+ action plays. Games where the quarterback had fewer than 20 plays were excluded, but the quarterback was still included if he otherwise had 14 such games.

The next step was to sum the weekly quarterback data on various metrics, including wins, and create season data.^{1} This allowed me to measure the correlation between a quarterback’s statistics over those 14+ games with that player’s winning percentage in those games.

As it turns out, ESPN’s Total QBR is very highly correlated with wins, with a 0.68 correlation coefficient.^{2} This is to be expected; after all, Total QBR is based off Expected Points Added on the team level, which generally tracks wins and losses. The second most correlated statistic with wins was Adjusted Net Yards per Attempt, my favorite non-proprietary quarterback metric. After ANY/A, both traditional passer rating and touchdowns per attempt were the next most correlated statistics with wins (after all, this is only a step or two away from saying scoring points is correlated with wins). In another unsurprising result, passing yards had almost no correlation with wins, while pass attempts had a slight negative correlation (as any Game Scripts observer would know). Take a look:

Stat | CC |
---|---|

ESPN QBR | 0.68 |

ANY/A | 0.57 |

Passer Rating | 0.56 |

TD/Att | 0.54 |

NY/A | 0.46 |

Yd/Att | 0.45 |

INT/Att | -0.43 |

Cmp% | 0.33 |

Sack Rate | -0.21 |

Pass Yds | 0.16 |

Attempts | -0.10 |

When ESPN first introduced QBR, I wrote that I was intrigued by the possibility of this metric, but frustrated that the specific details of the formula remained confidential. At the time, a clutch weight feature was included in the calculations, which made the metric more of a retrodictive statistic than a predictive one. Since then, ESPN has tweaked the formula several times, and the clutch weight has been capped.^{3} ESPN is not engaged in academia, so I understand why they have not published all the fine print; as a researcher, I’m still frustrated by that decision. Still, with 8 years of QBR data now publicly available, we can answer two questions: does Total QBR *predict* wins and how sticky is Total QBR?

We know that a high Total QBR is correlated with winning games, but we also know that there’s limited value to such a statement. ** If having a high Total QBR was one of the driving factor behind winning games, than such a variable would manifest itself in all games, not just the current one.** So with my sample of 152 quarterbacks, I used a random number generator to divide each quarterback season into two half-seasons. Then I calculated each quarterback’s average in several different categories and measured the correlation between a quarterback’s average in such category in each half-season **with his winning percentage in the other half-season**.

^{4}The results:

Stat | CC |
---|---|

ESPN QBR | 0.31 |

Wins | 0.28 |

ANY/A | 0.25 |

Passer Rating | 0.25 |

TD/Att | 0.24 |

NY/A | 0.22 |

Yd/Att | 0.20 |

Cmp% | 0.17 |

Pass Yds | 0.16 |

INT/Att | 0.15 |

Sack Rate | 0.14 |

Attempts | 0.06 |

As you would expect, all of our correlations are now smaller. But ESPN’s quarterback rating metric remains the best measure to predict wins. Perhaps even more impressively, Total QBR is more correlated with future wins than past wins. That’s pretty interesting. Another interesting result is that passer rating fares pretty well here, although much of the same issues as before remain with using correlation to derive causal direction.^{5}

One other concept to remember is that our sample of quarterbacks consists of players who were heavily involved in at least 14 games. That makes sure Peyton Manning, Tom Brady, and Drew Brees are involved, while filtering out some Christian Ponder, Blaine Gabbert, and Brandon Weeden seasons. In other words, the data set contains more above-average quarterbacks than a random sample would, so we may not be able to justify certain conclusions from this study.

The other important question is whether Total QBR is predictive of itself; i.e., how “sticky” is this metric over different time periods. We know that interceptions are very random, and knowing a quarterback’s prior interception rate is not all that helpful in predicting his future interception rate. Where does Total QBR fall along those lines?

Stat | CC |
---|---|

Pass Yds | 0.69 |

Attempts | 0.66 |

Sack Rate | 0.56 |

Cmp% | 0.49 |

Passer Rating | 0.49 |

ESPN QBR | 0.47 |

ANY/A | 0.46 |

NY/A | 0.45 |

TD/Att | 0.43 |

Yd/Att | 0.42 |

Wins | 0.28 |

INT/Att | 0.2 |

The most “sticky” stats were passing yards and pass attempts, which in retrospect isn’t too surprising. These reflect the style of the offense, the talent of the quarterback, and the quality of the defense, so they should be easier to predict. The second-least sticky metric was wins, which also makes sense. After that, ESPN’s Total QBR fits in a narrow tier with most of our other metrics as being somewhat predictable.

**Conclusion**

The numbers here indicate that Total QBR is worth examining. It may be a proprietary measure of quarterback play, but it’s not a subjective one with no basis in reality. It does seem to be the “best” measure of quarterback play, although whether the tradeoff in accuracy for transparency is worth it remains up to each individual reader. One of the drawbacks I see in Total QBR is the failure to incorporate strength of schedule. And while no other traditional passer metric does, either, it’s also easy enough to make those adjustments. Hopefully, an SOS-adjusted Total QBR measure will be released soon (I’ll note that the college football version does include a strength-of-schedule adjustment). My sense is that Total QBR is underutilized because (1) ESPN haters hate it because it’s an ESPN statistic, (2) it’s proprietary, and (3) analytics types disliked it because of the (now-eliminated) clutch rating. While I would not suggest making it the only tool at your disposal, it does appear to deserve a prominent place in your toolbox.

- For ESPN’s QBR, I took a weighted average of the weekly QBR data. I should note that this is
*not*the way ESPN calculates QBR. As explained to me via email, the scaling function that gives the “final” QBR on a 0-100 scale is nonlinear; as a result, you can’t just calculate a weighted average of the individual game QBR values to get season QBR. Instead, you need to have the “points per play”-like value that’s behind QBR and calculate the weighted average of that (and weight based on the capped clutch weights, not even the action plays), then re-apply the scaling function to get it back on the 0-100 scale. So while I’m recreating QBR, I’m not recreating it the way ESPN would. That disclaimer aside, I don’t think my method will bias these results. [↩] - As a reminder, the correlation coefficient is a measure of the linear relationship between two variables on a scale from -1 to 1. If two variables move in the same direction, their correlation coefficient will be close to 1. If two variables move with each other but in opposite directions (say, the number of hours you spend watching football and your significant other’s happiness level), then the CC will be closer to -1. If the two variables have no relationship at all, the CC will be close to zero. [↩]
- When Dean Oliver was on the Advanced NFL Stats podcast, he noted that the formula was tweaked in 2013 so that the “clutch index” part of the formula was essentially capped. He added (beginning at 13:45): “The most clutch plays are ending up counting essentially the same as all other plays. [What] we ended up deciding is that for games that are out of reach, when quarterbacks are putting up meaningless statistics because they are playing against a defense that is not trying as hard because they know that the game is essentially over – so that you can get your yards but we’re just trying to run out the clock – so we still keep in a clutch weight reduction effectively, associated with garbage time. But there isn’t the increase in clutch weight associated with clutch plays.” [↩]
- Then I did the entire process again, using a new set of random numbers, and averaged the results. [↩]
- For example, because passer rating is biased towards high completion percentage and low interception rates, quarterbacks who play with the lead tend to produce strong passer ratings; well, playing with the lead is pretty highly correlated with winning, and winning is also correlated with future wins. [↩]

{ 22 comments… read them below or add one }

Would be interesting to see how Football Outsiders’ proprietary DVOA for QBs measured up, as it is SoS-adjusted.

I’m normally fan of your work Chase, but I think this is pretty flawed. I think you are assuming that winning percentage is independent of the number. This is true for something like ANY/A or quarterback rating, but isn’t true for ESPN’s Total QBR. If you’re assigning value weighted based on the score, you’re factoring in winning percentage. Some examples:

Player A: 14/23, 217, 3 TDs, 0 Int

Player B: 12/21, 121, 1, 1

Player A: 16/23, 161, 1, 0

Player B: 17/24, 314, 3, 2

Player B’s ESPN Total QBR number was 93.5, Player A’s was 81.9. I’m not sure many people would agree that Player B was 10% better than Player A. It’s pretty hard to put up a high number in a losing effort. You have to play lights-out football, like Alex Smith did in the wild card round. Conversely you can get a really good ESPN QBR number if you manage to pile up your numbers in the second half of a close game (and win).

The other problem I have is that your sample group is biased towards winning quarterbacks.

You’re not just excluding Case Keenum, you’re excluding 2013 Houston quarterbacks.

You’re not just excluding Kirk Cousins, you’re excluding 2013 Washington quarterbacks.

You’re not just excluding Blaine Gabbert, you’re excluding 2013 Jacksonville quarterbacks.

You’re not just excluding Brandon Weeden, you’re excluding 2013 Cleveland quarterbacks.

That’s a lot of losing.

Quarterback rating is normally distributed, ESPN’s Total QBR is not. It’s skewed towards both extremes, but particularly so for high values. An analogy: normal QB rating are pea plants that do a little better with lots of water and a little worse in arid conditions. ESPN Total QBR plants are stunted peas in arid conditions and giant beanstalks in wet conditions. After a couple of generations you see which set of peas are taller… in an outdoor garden in Seattle.

Chris, I’m not quite following your QB A/B examples. Who are they? Was the rating for one game or both games?

Yes, the sample group is biased towards winning quarterbacks. That’s a drawback of the study, but I am not too bothered by it. There are other studies that can (and will) be run; this seemed like a good way to start the process.

Without agreeing that passer rating is normally distributed and Total QBR is not (I don’t doubt you, but I haven’t looked into it), why would we care about this?

Player A was Alex Smith, Player B was Andrew Luck. Those are the first and second half splits from their 2013 wild card game.

The NFL’s quarterback rating values are normally distributed. The NCAA’s passer rating values are normally distributed. Measurements of human performance tends to be normally distributed. I think that the fact that ESPN’s numbers are skewed is a sign that the underlying formula may be flawed.

So is your contention that Luck should not have been graded as better than Smith for that game? While I think Smith’s passing numbers were better –http://www.footballperspective.com/where-does-lucksmith-rank-among-great-playoff-qb-battles/ — it’s worth remembering that Smith lost a key fumble while Luck recovered a key fumble and took it in for a touchdown. And, IIRC, one of the Luck interceptions really wasn’t his fault. Anyway, reasonable minds can differ, but that game doesn’t seem like a good example to point out that QBR is not working properly.

Re: the other point, 1) I’d be surprised to find that the QBR distribution is materially different, if different at all, from the passer rating distribution or NY/A distribution or ANY/A distribution (have you found this to be true?); 2) even if it was materially different, I don’t think that would be a sign that the underlying formula would be flawed absent a concomitant explanation. What is the argument? I’m just not following this line of thinking.

The NFL’s quarterback rating values are normally distributed.Is this a proven fact, or is it an assumption based on:

Measurements of human performance tends to be normally distributed.And is passer rating really a measurement of human performance, since it has strange multipliers for the components? I guess all the multipliers would just make the distribution slide left or right on the x-axis, but still have a “normal” shape?

And for that matter, QBR is also a measurement of human performance, so it should be normal, shouldn’t it?

My understanding of the math of most correlational statistics (such as the conventional pearson-product momet correlation) is that when range is restricted, correlations usually are lowered. So if certain poorer performing QB’s are excluded, then this restricts the range and thus. effects the co-efficicent.

I think you addressed comment # 2 about the problem of orthogonality associated with the ESPN metric. Not sure how it is a problem with the ANY/A, but then I only spent five minutes pondering all this.

Chase, doesn’t Total QBR also take designed QB rushes and QB scrambles into account? Stats such as ANY/A does not.

Perhaps it’s a small part of the metric, but it could also explain some of its predictive power.

Correct.

I’ve never understood why people are so obsessed with using correlation as a metric in situations like this. Correlation measures the degree to which two variables are linearly related to one another. I don’t know why we would expect any of these relationships to be linear.

The correlation between X and X^2 is 0, but you wouldn’t say X and X^2 are unrelated to one another.

The correlation between X and X is higher than the correlation between X and log(X), but I wouldn’t want to conclude that X is more related to X than it is to log(X).

“The correlation between X and X^2 is 0″

That is completely untrue! The correlation between X and X^2 is nearly 1!

However, I agree that the relationships aren’t necessarily linear, but it does provide a great starting point. If you wanted to do a further analysis you would check the error of the predicted linear relationship against the actual, and if the difference follows a pattern then a non-linear relationship may be better. It’s also a useful time to graph the results, but with so many metrics it can be difficult to display coherently.

It’s only nearly 1 if you restrict the range of the data to positive values.

At any rate, the specific example doesn’t matter. The greater point is that there’s no a priori reason to assume linearity, so a simple ranking based on correlations is not the end of the story. I agree it’s a good place to start, though.

Taking the clutch factor out made QBR worse. Russell Wilson dominated QBR his rookie year, but last year, doing the same clutchy things game after game, he was bad. (I don’t believe the standard analytics argument that clutchness doesn’t exist in sports, but that’s an argument for another time.)

Incidentally, interceptions are another case where they might look random historically and in aggregate, but that doesn’t imply INTs are literally impossible to manage. Aaron Rodgers trades sacks for interceptions.

I don’t know why the swipes at passer rating here. In previous analysis on this site, didn’t passer rating come out as only very slightly less predictive than NY/A? Teams rated #1 in passer rating differential (like Seattle last year) have won 1/3 of NFL championships since 1940 and teams in the top 3 have won something like 3/5 of them. Obviously ANY/A or NY/A might do slightly better, but passer rating’s track record is not bad. Passer rating’s formula might not be pure, but its results are good.

The biggest problem with QBR as a quarterback metric is that it entangles the QB’s production with the team’s and since it’s proprietary there’s absolutely no way to untangle it. With passer rating, a QB can play to the stat and know that, historically, if he pushes up passer rating it gives his team a higher chance of winning. You can’t play to QBR. A better QBR stat would model what the QB can control and be less entangled, and not instead be more highly correlated with winning, whether predictive or not. Wilson had a bad and injured offensive line this year, his top 2 receivers were out most of the year, and his QBR apparently suffered for it, but he was still able to achieve >100 passer rating, a rating model he can play to. An even better rating model would adjust for receivers and offensive line play. Obviously, that would be hard, much harder than making a stat that correlates very highly with wins.

*trades interceptions for sacks, I should have said.

With passer rating, a QB can play to the statAre you suggesting that NFL QB’s are consciously making on-field decisions to improve their passer rating? Passer rating is basically designed to encompass everything you would want a QB to do anyway, except avoid sacks. That would be a great way to improve passer rating – just take sacks any time you think a completion is unlikely.

Uh, ESPN untangled team play from QB play as much as possible. Read up its introductory text on QBR.

I may be wrong, but I believe the QBR adjustments were retroactively applied to past seasons.

(3) analytics types disliked it because of the (now-eliminated) clutch rating.This was me. I didn’t know that they had reduced the use of a clutch rating.

But I also thought that there was some sort of subjective component to the rating, though maybe that was just the clutch part.

Why does the word “clutch” mean “high leverage situation”?

hello there and thank you for your information – I’ve

certainly picked up anything new from right here. I did however expertise a

few technical issues using this site, as

I experienced to reload the web site a lot of times previous to

I could get it to load properly. I had been wondering if

your hosting is OK? Not that I’m complaining, but sluggish loading instances times will very frequently affect your placement in google and could damage your quality score if ads and marketing with Adwords.

Anyway I’m adding this RSS to my e-mail and can look out for much more of your respective interesting content.

Make sure you update this again very soon.

I find Total QBR to be a garbage stat and I feel that way based on numbers I have seen that make no sense. Basically if a QB takes any sacks he gets penalized more than a QB who may not get sacked but throw a pick or two. It doesn’t factor in WHY the QB got sacked (coverage, poor blocking, etc) so if a guy like Rodgers takes sacks because of his line being poor or guys not getting open then his QBR suffers.

It also penalizes guys in a way if they do well early on then coast in the second half while rewarding guys who may stink early and then play catchup.

It rewards QBs who may be scramblers yet be horrible passing the ball. A QB could break off a big run and be useless throwing the ball the entire game while his team wins yet his QBR could (and probably would) be higher than a QB who had a big passing game yet maybe took some sacks or the team lost.

And finally it penalizes QBs big time for interceptions that get returned for TDs whether it is their fault or not. The receiver tips a perfectly thrown ball that gets returned for a score? QB’s fault according to ESPN. The receiver breaks off his route or falls or just doesn’t go where he was supposed to and the ball is picked and returned for 6? QB’s fault according to them.

ESPN tries to act like their system takes things into account more than just the numbers (like passer rating does) yet in all the cases above they show they really don’t take the circumstance into account.

Passer rating is a much more valuable stat than this rating when it comes to judging QBs on their performance. Passer rating may not be perfect yet it has enough of a history to show that in general a guy with a higher rating is performing at a high level and if one really wishes to look at the team’s record (last I checked the QB isn’t the only guy responsible for wins and losses) you will generally see those QBs with high ratings were also on winning teams.

Aaron Rodgers really is the best example of how ESPN’s system is incredibly flawed. Year after year he is lower than guys he has clearly played better than but because he takes a lot of sacks his QBR is lower than theirs. Only in his record breaking 2011 season has Rodgers been in the top 5 of total QBR…is there really anyone out there who doesn’t think he has been a top 5 QB in each year over the last 4 seasons?

I did not like QBR when it first came out, but as time passed, I liked it more and more. If you look at QBR rankings over the years (since 2006) you find that MVP voting for QBs closely mirrors QBR rankings. The top QB vote receiver for MVP has had the highest QBR each year of it’s existence (among starters). Also, even though I would like to be able to see exactly how they figure QBR out, it seems to somewhat match ANY/A, to a degree. It is better to look at QBR on a season per season basis, and not game by game. It is a better long term statistic.

{ 10 trackbacks }