## Is ESPN’s QBR the best measure of quarterback play?

One of the very first posts at Football Perspective measured how various passing stats were correlated with wins.  One of the main conclusions from that post was that passer rating, because of its heavy emphasis on completion percentage and interception rate, was not the ideal way to measure quarterback play. But what about ESPN’s Total QBR, a statistic invented specifically to improve on — and supersede — traditional passer rating?

As a reminder, we can’t simply correlate a statistic with wins to determine the utility of that metric. The simplest way to remember this is that 4th quarter kneeldowns are highly correlated with wins. Just because you notice it’s raining when the ground is wet doesn’t mean a wet ground causes rain; i.e., just because two variables are correlated doesn’t mean variable A leads to variable B (alternatively, variable B could lead to variable A, variable C could lead to both variable A and B, or the sample size could be too small to determine any legitimate causal relationship). That said, it at least makes sense to begin with a look at how various statistics have correlate with wins.

The Sample Set

Throughout this post, I will be looking at a set of quarterback data consisting of the 152 quarterback seasons from 2006 to 2013 where the player had at least 14 games with 20+ action plays. Games where the quarterback had fewer than 20 plays were excluded, but the quarterback was still included if he otherwise had 14 such games.

The next step was to sum the weekly quarterback data on various metrics, including wins, and create season data.1 This allowed me to measure the correlation between a quarterback’s statistics over those 14+ games with that player’s winning percentage in those games.

As it turns out, ESPN’s Total QBR is very highly correlated with wins, with a 0.68 correlation coefficient.2 This is to be expected; after all, Total QBR is based off Expected Points Added on the team level, which generally tracks wins and losses. The second most correlated statistic with wins was Adjusted Net Yards per Attempt, my favorite non-proprietary quarterback metric. After ANY/A, both traditional passer rating and touchdowns per attempt were the next most correlated statistics with wins (after all, this is only a step or two away from saying scoring points is correlated with wins). In another unsurprising result, passing yards had almost no correlation with wins, while pass attempts had a slight negative correlation (as any Game Scripts observer would know).  Take a look:

StatCC
ESPN QBR0.68
ANY/A0.57
Passer Rating0.56
TD/Att0.54
NY/A0.46
Yd/Att0.45
INT/Att-0.43
Cmp%0.33
Sack Rate-0.21
Pass Yds0.16
Attempts-0.10

When ESPN first introduced QBR, I wrote that I was intrigued by the possibility of this metric, but frustrated that the specific details of the formula remained confidential. At the time, a clutch weight feature was included in the calculations, which made the metric more of a retrodictive statistic than a predictive one. Since then, ESPN has tweaked the formula several times, and the clutch weight has been capped.3 ESPN is not engaged in academia, so I understand why they have not published all the fine print; as a researcher, I’m still frustrated by that decision. Still, with 8 years of QBR data now publicly available, we can answer two questions: does Total QBR predict wins and how sticky is Total QBR?

We know that a high Total QBR is correlated with winning games, but we also know that there’s limited value to such a statement. If having a high Total QBR was one of the driving factor behind winning games, than such a variable would manifest itself in all games, not just the current one. So with my sample of 152 quarterbacks, I used a random number generator to divide each quarterback season into two half-seasons. Then I calculated each quarterback’s average in several different categories and measured the correlation between a quarterback’s average in such category in each half-season with his winning percentage in the other half-season.4 The results:

StatCC
ESPN QBR0.31
Wins0.28
ANY/A0.25
Passer Rating0.25
TD/Att0.24
NY/A0.22
Yd/Att0.20
Cmp%0.17
Pass Yds0.16
INT/Att0.15
Sack Rate0.14
Attempts0.06

As you would expect, all of our correlations are now smaller. But ESPN’s quarterback rating metric remains the best measure to predict wins. Perhaps even more impressively, Total QBR is more correlated with future wins than past wins. That’s pretty interesting. Another interesting result is that passer rating fares pretty well here, although much of the same issues as before remain with using correlation to derive causal direction.5

One other concept to remember is that our sample of quarterbacks consists of players who were heavily involved in at least 14 games. That makes sure Peyton Manning, Tom Brady, and Drew Brees are involved, while filtering out some Christian Ponder, Blaine Gabbert, and Brandon Weeden seasons. In other words, the data set contains more above-average quarterbacks than a random sample would, so we may not be able to justify certain conclusions from this study.

The other important question is whether Total QBR is predictive of itself; i.e., how “sticky” is this metric over different time periods. We know that interceptions are very random, and knowing a quarterback’s prior interception rate is not all that helpful in predicting his future interception rate. Where does Total QBR fall along those lines?

StatCC
Pass Yds0.69
Attempts0.66
Sack Rate0.56
Cmp%0.49
Passer Rating0.49
ESPN QBR0.47
ANY/A0.46
NY/A0.45
TD/Att0.43
Yd/Att0.42
Wins0.28
INT/Att0.2

The most “sticky” stats were passing yards and pass attempts, which in retrospect isn’t too surprising. These reflect the style of the offense, the talent of the quarterback, and the quality of the defense, so they should be easier to predict. The second-least sticky metric was wins, which also makes sense. After that, ESPN’s Total QBR fits in a narrow tier with most of our other metrics as being somewhat predictable.

Conclusion

The numbers here indicate that Total QBR is worth examining.  It may be a proprietary measure of quarterback play, but it’s not a subjective one with no basis in reality.  It does seem to be the “best” measure of quarterback play, although whether the tradeoff in accuracy for transparency is worth it remains up to each individual reader. One of the drawbacks I see in Total QBR is the failure to incorporate strength of schedule. And while no other traditional passer metric does, either, it’s also easy enough to make those adjustments. Hopefully, an SOS-adjusted Total QBR measure will be released soon (I’ll note that the college football version does include a strength-of-schedule adjustment).  My sense is that Total QBR is underutilized because (1) ESPN haters hate it because it’s an ESPN statistic, (2) it’s proprietary, and (3) analytics types disliked it because of the (now-eliminated) clutch rating.  While I would not suggest making it the only tool at your disposal, it does appear to deserve a prominent place in your toolbox.

1. For ESPN’s QBR, I took a weighted average of the weekly QBR data. I should note that this is not the way ESPN calculates QBR. As explained to me via email, the scaling function that gives the “final” QBR on a 0-100 scale is nonlinear; as a result, you can’t just calculate a weighted average of the individual game QBR values to get season QBR. Instead, you need to have the “points per play”-like value that’s behind QBR and calculate the weighted average of that (and weight based on the capped clutch weights, not even the action plays), then re-apply the scaling function to get it back on the 0-100 scale. So while I’m recreating QBR, I’m not recreating it the way ESPN would. That disclaimer aside, I don’t think my method will bias these results. []
2. As a reminder, the correlation coefficient is a measure of the linear relationship between two variables on a scale from -1 to 1. If two variables move in the same direction, their correlation coefficient will be close to 1. If two variables move with each other but in opposite directions (say, the number of hours you spend watching football and your significant other’s happiness level), then the CC will be closer to -1. If the two variables have no relationship at all, the CC will be close to zero. []
3. When Dean Oliver was on the Advanced NFL Stats podcast, he noted that the formula was tweaked in 2013 so that the “clutch index” part of the formula was essentially capped. He added (beginning at 13:45): “The most clutch plays are ending up counting essentially the same as all other plays. [What] we ended up deciding is that for games that are out of reach, when quarterbacks are putting up meaningless statistics because they are playing against a defense that is not trying as hard because they know that the game is essentially over – so that you can get your yards but we’re just trying to run out the clock – so we still keep in a clutch weight reduction effectively, associated with garbage time. But there isn’t the increase in clutch weight associated with clutch plays.” []
4. Then I did the entire process again, using a new set of random numbers, and averaged the results. []
5. For example, because passer rating is biased towards high completion percentage and low interception rates, quarterbacks who play with the lead tend to produce strong passer ratings; well, playing with the lead is pretty highly correlated with winning, and winning is also correlated with future wins. []
• Jp

Would be interesting to see how Football Outsiders’ proprietary DVOA for QBs measured up, as it is SoS-adjusted.

• Pingback: Monday morning buffet | Get The Picture()

• I’m normally fan of your work Chase, but I think this is pretty flawed. I think you are assuming that winning percentage is independent of the number. This is true for something like ANY/A or quarterback rating, but isn’t true for ESPN’s Total QBR. If you’re assigning value weighted based on the score, you’re factoring in winning percentage. Some examples:

Player A: 14/23, 217, 3 TDs, 0 Int
Player B: 12/21, 121, 1, 1

Player A: 16/23, 161, 1, 0
Player B: 17/24, 314, 3, 2

Player B’s ESPN Total QBR number was 93.5, Player A’s was 81.9. I’m not sure many people would agree that Player B was 10% better than Player A. It’s pretty hard to put up a high number in a losing effort. You have to play lights-out football, like Alex Smith did in the wild card round. Conversely you can get a really good ESPN QBR number if you manage to pile up your numbers in the second half of a close game (and win).

The other problem I have is that your sample group is biased towards winning quarterbacks.

You’re not just excluding Case Keenum, you’re excluding 2013 Houston quarterbacks.
You’re not just excluding Kirk Cousins, you’re excluding 2013 Washington quarterbacks.
You’re not just excluding Blaine Gabbert, you’re excluding 2013 Jacksonville quarterbacks.
You’re not just excluding Brandon Weeden, you’re excluding 2013 Cleveland quarterbacks.

That’s a lot of losing.

Quarterback rating is normally distributed, ESPN’s Total QBR is not. It’s skewed towards both extremes, but particularly so for high values. An analogy: normal QB rating are pea plants that do a little better with lots of water and a little worse in arid conditions. ESPN Total QBR plants are stunted peas in arid conditions and giant beanstalks in wet conditions. After a couple of generations you see which set of peas are taller… in an outdoor garden in Seattle.

• Chase Stuart

Chris, I’m not quite following your QB A/B examples. Who are they? Was the rating for one game or both games?

Yes, the sample group is biased towards winning quarterbacks. That’s a drawback of the study, but I am not too bothered by it. There are other studies that can (and will) be run; this seemed like a good way to start the process.

Without agreeing that passer rating is normally distributed and Total QBR is not (I don’t doubt you, but I haven’t looked into it), why would we care about this?

• Player A was Alex Smith, Player B was Andrew Luck. Those are the first and second half splits from their 2013 wild card game.

The NFL’s quarterback rating values are normally distributed. The NCAA’s passer rating values are normally distributed. Measurements of human performance tends to be normally distributed. I think that the fact that ESPN’s numbers are skewed is a sign that the underlying formula may be flawed.

• Chase Stuart

So is your contention that Luck should not have been graded as better than Smith for that game? While I think Smith’s passing numbers were better –http://www.footballperspective.com/where-does-lucksmith-rank-among-great-playoff-qb-battles/ — it’s worth remembering that Smith lost a key fumble while Luck recovered a key fumble and took it in for a touchdown. And, IIRC, one of the Luck interceptions really wasn’t his fault. Anyway, reasonable minds can differ, but that game doesn’t seem like a good example to point out that QBR is not working properly.

Re: the other point, 1) I’d be surprised to find that the QBR distribution is materially different, if different at all, from the passer rating distribution or NY/A distribution or ANY/A distribution (have you found this to be true?); 2) even if it was materially different, I don’t think that would be a sign that the underlying formula would be flawed absent a concomitant explanation. What is the argument? I’m just not following this line of thinking.

• Paul

Chase using your Logic when it came to Luck’s interception not being his fault, then ELI MANNING SHOULD HAVE the BEST QBR cause most of his interceptions are not his fault.

• Richie

The NFL’s quarterback rating values are normally distributed.

Is this a proven fact, or is it an assumption based on:

Measurements of human performance tends to be normally distributed.

And is passer rating really a measurement of human performance, since it has strange multipliers for the components? I guess all the multipliers would just make the distribution slide left or right on the x-axis, but still have a “normal” shape?

• Richie

And for that matter, QBR is also a measurement of human performance, so it should be normal, shouldn’t it?

• Tim Truemper

My understanding of the math of most correlational statistics (such as the conventional pearson-product momet correlation) is that when range is restricted, correlations usually are lowered. So if certain poorer performing QB’s are excluded, then this restricts the range and thus. effects the co-efficicent.

I think you addressed comment # 2 about the problem of orthogonality associated with the ESPN metric. Not sure how it is a problem with the ANY/A, but then I only spent five minutes pondering all this.

• Guru

Chase, doesn’t Total QBR also take designed QB rushes and QB scrambles into account? Stats such as ANY/A does not.

Perhaps it’s a small part of the metric, but it could also explain some of its predictive power.

• Chase Stuart

Correct.

• Brian Anderson

I’ve never understood why people are so obsessed with using correlation as a metric in situations like this. Correlation measures the degree to which two variables are linearly related to one another. I don’t know why we would expect any of these relationships to be linear.

The correlation between X and X^2 is 0, but you wouldn’t say X and X^2 are unrelated to one another.

The correlation between X and X is higher than the correlation between X and log(X), but I wouldn’t want to conclude that X is more related to X than it is to log(X).

• James

“The correlation between X and X^2 is 0”

That is completely untrue! The correlation between X and X^2 is nearly 1!

However, I agree that the relationships aren’t necessarily linear, but it does provide a great starting point. If you wanted to do a further analysis you would check the error of the predicted linear relationship against the actual, and if the difference follows a pattern then a non-linear relationship may be better. It’s also a useful time to graph the results, but with so many metrics it can be difficult to display coherently.

• Brian Anderson

It’s only nearly 1 if you restrict the range of the data to positive values.

At any rate, the specific example doesn’t matter. The greater point is that there’s no a priori reason to assume linearity, so a simple ranking based on correlations is not the end of the story. I agree it’s a good place to start, though.

• Pingback: Dawg Treats – Tuesday » Bulldawg Illustrated()

• Michael Terry

Taking the clutch factor out made QBR worse. Russell Wilson dominated QBR his rookie year, but last year, doing the same clutchy things game after game, he was bad. (I don’t believe the standard analytics argument that clutchness doesn’t exist in sports, but that’s an argument for another time.)

Incidentally, interceptions are another case where they might look random historically and in aggregate, but that doesn’t imply INTs are literally impossible to manage. Aaron Rodgers trades sacks for interceptions.

I don’t know why the swipes at passer rating here. In previous analysis on this site, didn’t passer rating come out as only very slightly less predictive than NY/A? Teams rated #1 in passer rating differential (like Seattle last year) have won 1/3 of NFL championships since 1940 and teams in the top 3 have won something like 3/5 of them. Obviously ANY/A or NY/A might do slightly better, but passer rating’s track record is not bad. Passer rating’s formula might not be pure, but its results are good.

The biggest problem with QBR as a quarterback metric is that it entangles the QB’s production with the team’s and since it’s proprietary there’s absolutely no way to untangle it. With passer rating, a QB can play to the stat and know that, historically, if he pushes up passer rating it gives his team a higher chance of winning. You can’t play to QBR. A better QBR stat would model what the QB can control and be less entangled, and not instead be more highly correlated with winning, whether predictive or not. Wilson had a bad and injured offensive line this year, his top 2 receivers were out most of the year, and his QBR apparently suffered for it, but he was still able to achieve >100 passer rating, a rating model he can play to. An even better rating model would adjust for receivers and offensive line play. Obviously, that would be hard, much harder than making a stat that correlates very highly with wins.

• Michael Terry

*trades interceptions for sacks, I should have said.

• Richie

With passer rating, a QB can play to the stat

Are you suggesting that NFL QB’s are consciously making on-field decisions to improve their passer rating? Passer rating is basically designed to encompass everything you would want a QB to do anyway, except avoid sacks. That would be a great way to improve passer rating – just take sacks any time you think a completion is unlikely.

• Jacob Prescott

QBR is garbage. And traditional qb rating is also a poor dipiction of how a QB performs. What none of these stats can do is get to what the QB is thinking pre-snap. How a QB diagnoses a defense BEFORE the snap and judges that a 5 yd drag route will go for a big play based on how the defense is playing in that particular situation. Thats why i look at Peyton Manning, Tom Brady and Aaron Rodgers on another level. They have coaches minds and their teams lean on them. And even with this added pressure, they continue to perform. That attribute cannot be quantified by math.

• Ben

Uh, ESPN untangled team play from QB play as much as possible. Read up its introductory text on QBR.

I may be wrong, but I believe the QBR adjustments were retroactively applied to past seasons.

• Richie

(3) analytics types disliked it because of the (now-eliminated) clutch rating.

This was me. I didn’t know that they had reduced the use of a clutch rating.

But I also thought that there was some sort of subjective component to the rating, though maybe that was just the clutch part.

Why does the word “clutch” mean “high leverage situation”?

• Pingback: Why Colin Kaepernick Is Worth The Money()

• hello there and thank you for your information – I’ve
certainly picked up anything new from right here. I did however expertise a
few technical issues using this site, as
I experienced to reload the web site a lot of times previous to
I could get it to load properly. I had been wondering if
Anyway I’m adding this RSS to my e-mail and can look out for much more of your respective interesting content.
Make sure you update this again very soon.

• Mike

I find Total QBR to be a garbage stat and I feel that way based on numbers I have seen that make no sense. Basically if a QB takes any sacks he gets penalized more than a QB who may not get sacked but throw a pick or two. It doesn’t factor in WHY the QB got sacked (coverage, poor blocking, etc) so if a guy like Rodgers takes sacks because of his line being poor or guys not getting open then his QBR suffers.

It also penalizes guys in a way if they do well early on then coast in the second half while rewarding guys who may stink early and then play catchup.

It rewards QBs who may be scramblers yet be horrible passing the ball. A QB could break off a big run and be useless throwing the ball the entire game while his team wins yet his QBR could (and probably would) be higher than a QB who had a big passing game yet maybe took some sacks or the team lost.

And finally it penalizes QBs big time for interceptions that get returned for TDs whether it is their fault or not. The receiver tips a perfectly thrown ball that gets returned for a score? QB’s fault according to ESPN. The receiver breaks off his route or falls or just doesn’t go where he was supposed to and the ball is picked and returned for 6? QB’s fault according to them.

ESPN tries to act like their system takes things into account more than just the numbers (like passer rating does) yet in all the cases above they show they really don’t take the circumstance into account.

Passer rating is a much more valuable stat than this rating when it comes to judging QBs on their performance. Passer rating may not be perfect yet it has enough of a history to show that in general a guy with a higher rating is performing at a high level and if one really wishes to look at the team’s record (last I checked the QB isn’t the only guy responsible for wins and losses) you will generally see those QBs with high ratings were also on winning teams.

Aaron Rodgers really is the best example of how ESPN’s system is incredibly flawed. Year after year he is lower than guys he has clearly played better than but because he takes a lot of sacks his QBR is lower than theirs. Only in his record breaking 2011 season has Rodgers been in the top 5 of total QBR…is there really anyone out there who doesn’t think he has been a top 5 QB in each year over the last 4 seasons?

• TMS71

In total QBR Rodgers was 6th in 2013, 4th in 2012, 1st in 2011 and 3rd in 2010. Where are you getting your numbers. Rodgers is awesome and QBR thinks so too.

• Justin

I did not like QBR when it first came out, but as time passed, I liked it more and more. If you look at QBR rankings over the years (since 2006) you find that MVP voting for QBs closely mirrors QBR rankings. The top QB vote receiver for MVP has had the highest QBR each year of it’s existence (among starters). Also, even though I would like to be able to see exactly how they figure QBR out, it seems to somewhat match ANY/A, to a degree. It is better to look at QBR on a season per season basis, and not game by game. It is a better long term statistic.

• Pingback: Projecting the 2014 QB Draft Class()

• Michael Terry

Here is more data that confounds QBR: Russell is 12-7 in games where the opposing QB has a higher QBR than he does. The next best QB is Tom Brady at 10-21! Since the opposing QB had worse QBR, the defense by inference did not carry him to these wins.

And this is why I hate QBR:

http://espn.go.com/blog/statsinfo/post/_/id/101699/no-one-gets-more-help-than-russell-wilson

ESPN just used it to prove that Wilson not only isn’t an elite QB, he’s not even really all that good. How can someone who gets more help than any other QB matter at all? But as the foregoing demonstrates, QBR is deeply flawed when it comes to Wilson, and thus perpetuates a simplistic model of QB evaluation that equates “elite” not with “has an elite effect on wins and losses” but “looks like a great traditional pocket passer and here’s a stat that matches our eye test the best”.

In the same way that Denver immediately became the best rushing team in football when Tebow took over, Vick’s Falcons had even higher rush DVOA than this year’s Seahawks, and Chris Johnson ran for 2000 yards with Vince Young, Wilson tilts the field. Difference is, Wilson is also a great passer. And that’s why he’s an elite QB.

Here’s Seattle’s offensive DVOA rank the last several seasons:

2011: #22