Within the analytics community, we seem to have reached a consensus that ANY/A is the best box score metric for measuring passing efficiency. Over at the Intentional Rounding blog, Danny Tuccitto tested the validity of ANY/A using a technique called Confirmatory Factor Analysis. You can read his three part analysis here, here, and here. Essentially, he discovers that Y/A and TD % are valid statistics for measuring QB quality, while sack % and INT % are not. At first I was skeptical, but after some pondering I came up with a half-baked theory of why this might be true:

As we evaluate the potential for an athlete to succeed in professional sports, there are two kinds of statistics: Qualifying and Disqualifying. In the case of quarterbacks, I define a qualifying statistic as a minimum threshold the player must meet to even be considered NFL worthy. If we deconstruct ANY/A into its four components, Y/A and TD % emerge as qualifying statistics. In today’s NFL, I estimate that a QB must possess a true talent level of at least 6.0 Y/A and 2.5 TD % to deserve a roster spot. There are very few people in the world who can reach those thresholds against NFL caliber defenses (my best guess is around 100). With these two simple statistics, we’ve already weeded out the vast majority of quarterbacks from ever playing in the NFL.

Next, we turn to sack % and INT %, which are disqualifying statistics. By themselves, neither of these skills qualify a QB to play in the NFL. Anybody can avoid sacks or interceptions if they’re not worried about gaining yards. However, the inability to avoid sacks or interceptions will disqualify a QB from the NFL, regardless of how high his Y/A and TD % might be. I estimate these limits as roughly a true talent 12% sack rate and 4.5% INT rate. The population of quarterbacks who can stay under these limits AND perform above the minimum Y/A and TD % is very small. In most years, there aren’t enough of these QB’s to fill the 32 NFL starting spots. Among quarterbacks who receive significant NFL playing time, there is a strong survivorship bias for the disqualifying statistics of sack % and INT %, as the quarterbacks who make too many negative plays have already been weeded out of the sample. Given that Y/A and TD % are far rarer skills with no upper limits, these two statistics are the true measuring stick at the NFL level.

To test this theory, I created a very simple metric called Positive Yards Per Attempt (PY/A). It’s just passing yards plus a 20 yard bonus for touchdowns, divided by pass attempts (which does not sacks). I then converted PY/A into a value metric by measuring it relative to league average (RPY/A)1 and VALUE above average by multiplying RPY/A by attempts. We already have these variations of ANY/A (that is, RANY/A and VALUE), so comparing the two metrics is very straightforward. Since the merger, there have been 1,423 QB seasons of with least 200 dropbacks. This table lists the top 100 seasons of PY/A VALUE, as well as the ANY/A VALUE and rankings for these players. The “Diff” column signifies the gap in ranking between the the two metrics, with a positive number indicating a QB who is favored by PY/A and negative number favoring ANY/A.

This list makes a strong case for the validity of PY/A. It’s populated by the greatest QB seasons of all time at the top, and filled out by a number of other notably great and very good seasons. There are a few head scratchers (most notably Lynn Dickey at #9), but for the most part it’s a very credible list that closely mirrors the ANY/A rankings. That’s the point, really. When we remove sacks and interceptions from ANY/A, it doesn’t lose much accuracy, if any. At first glance, I was concerned that PY/A systematically overrates certain quarterbacks and underrates others. That’s probably true to a certain degree. However, I would argue that ANY/A has the same issue, except it’s a different set of quarterbacks who are over- and underrated by it. The true balance almost certainly lies somewhere in between the two metrics. FWIW, the correlation between RPY/A and RANY/A is a robust 0.877, with an r-squared of 0.769.

Now lets look at the other end of the spectrum – the 100 worst PY/A VALUE seasons since 1970.

I actually find the Worst list even more validating of PY/A than the Best list. When we think of bad quarterbacks, most us reflexively focus on quarterbacks who make a lot of mistakes and sink their teams in obvious and memorable ways. And this list is filled with conventionally terrible quarterbacks. But remember, nearly all of their negative plays have been removed, so it’s not their mistakes putting them on the list. It’s their impotence. These guys couldn’t make plays or move the ball down the field, killing their teams slowly and agonizingly. At the very top (err, bottom), we find Derek Carr’s rookie year. A lot of fans and pundits classify Carr as a budding franchise QB who showed “flashes of potential”. Actually no, he showed the exact opposite. While the younger Carr avoided sacks and interceptions at a reasonable rate, his Y/A was absolutely pathetic. Even accounting for his lousy supporting cast, that is a major red flag. It’s much easier for a young QB to reign in his mistakes than it is for him to suddenly learn how to make positive plays down the field. Blake Bortles fits precisely the same troubling profile, so I don’t have much hope for the class of 2014.

Does this change your feelings about ANY/A? Do you think Danny and I are wasting our time? If anyone else has created their own passing metric using basic stats, I’d love to hear about it.

1. Note that in calculating league average, I excluded the player in question from the league average totals. So each player is compared to a slightly different definition of league average. []
The question that Danny is asking is different from the ones that we usually ask.

Normally we ask questions like “How much did this QB’s performance contribute to his team’s success?” and “Which stats best reflect a QB’s underlying abilities? How strong are this QB’s underlying abilities, as evidenced by his performance?”

By choosing to use factor analysis, Danny is asking something like “If we assume that there is a single core ability underlying much of a QB’s performance, which stats best reflect that one ability? And how good is each QB at that one core ability, as evidenced by his performance?” A quarterback may also have other abilities, besides that one core ability, which are stable and which help his team succeed, but those are left out in this analysis.

For example, in the NBA “being good at rebounding” may be a stable ability which some point guards have, which helps their team win, but which is unrelated to most of the attributes that PGs are usually judged on. So rebounding would show up in a factor analysis as “not a valid measure of PG quality”, where “PG quality” refers to the one core ability which underlies the largest chunk of a PG’s performance. But, all else equal, I’d still rather have a good rebounding PG on my team because his rebounding is a stable trait which helps the team win (even if it is unrelated to his more purely pointguardy abilities). That is basically what Danny’s analysis is showing about sacks & INTs.

Excellent point. In fact I wouldn’t be surprised if PY/A has a much weaker correlation to winning since it eliminates turnovers (which, while usually random, correlate strongly to W/L).

Rightfully so as winning/losing is a TEAM measure not an individual stat.

Yet this is an attempt to get an understanding of the INDIVIDUAL performance

Thanks for bringing this up. As with any basic metric, this one breaks down at the margins. With peripheral stats like sacks, interceptions, fumbles, and QB rushing, the 80% of QBs clustered around the center of the bell curve are not really affected. Any differences mostly represent noise. However, at the extremes these measurements do matter. Think Marino’s sack rate, Gabriel’s INT rate, Warner’s fumbles, Vick’s rushing, Tarkenton in all four categories. So I agree that PY/A does not capture the skillset of all quarterbacks, but I think it does a good job in the majority of cases.

I like the NBA analogy. Using your example, an exceptional rebounding guard like Rajon Rondo will always be underrated by traditional PG measures. That’s why there is no such thing as an all encompassing metric that measures everything a player does. Thanks for commenting.

Still can’t wrap my head around the calculations for these stats, but I do like it. I like the idea of QBs being measured by their own incompetency, rather than just their mistakes.

As a Browns fan, I always felt like Colt McCoy was just about the worst QB I had seen. A complete inability to move the ball down the field, he quickly gave up on plays and was far too cautious to win games. This stat totally validates me, as it has his ’11 season even lower than Brady Quinn’s ’09 season and THE BIG D Doug Pederson in ’00. No idea how Derek Anderson’s ’09 season didn’t make this list though. Apprx. 3 tds 11ints and I don’t think he completed half of his passes.Had the infamous 2/17 game that resulted in a win.

http://www.pro-football-reference.com/boxscores/200910110buf.htm

Yes, cases like McCoy are precisely whom this stat is designed to highlight (or lowlight). I’m glad it resonates with you. The only reason Derek Anderson’s ghastly ’09 season doesn’t make the list is because he didn’t have enough attempts. I don’t have my spreadsheets with me but later today I’ll be happy to look that season up for you.

Are you having difficulty wrapping your head around Danny’s numbers or mine? I’ll be the first to admit that I don’t entirely understand the complex modeling processes he uses, but I trust that he’s doing it right and therefore take his results seriously.

Great analysis of a fascinating premise. As a Lions fan, this resonates with me after watching Joey Harrington timidly checkdown over and over again. He didn’t throw a particularly high number of INTs, nor did he standout as taking too many sacks. He just could not move the team an inch, and almost never took any risks. You got this sense of progressive, frustrating, hopelessness when watching him.

I would much rather have a quarterback who takes some risks to try to overcome his team’s weaknesses (Brett Favre comes immediately to mind), than someone who has no balls.

Wolverine – totally agree, see my post about Bradshaw below. The guy throws three picks in the SB, but also continues to launch the ball down field. Yes, we’d be singing a different song if they lost, and it’s true he’s got Stallworth and Swann, but the point remains – the guy was ballsy (I think he was calling those plays as well). Favre is that way, and Flacco too.

BTW, this is great “You got this sense of progressive, frustrating, hopelessness when watching him.” You’re perfectly encapsulating the way a lot of fans probably feel about their own QB’s.

Yes, a QB can definitely be too efficient, or robotic, and that can be a losing effort. In some cases, you’re playing right into the defense’s hands, and it’s not conducive to coming back from behind. Kenny Anderson was a terrific QB, but would often throw that 3rd down pass for eight yards, when the Bengals needed 11, for example. Late in games, Cincinnati was pretty much toast when they fell behind, Anderson was a poor comeback QB. In fact, he’s tied with Russell Wilson for fourth quarter comebacks.

Yes, Bradshaw was calling plays then, and was doing so when many other QBs in the 70s and early 80s weren’t doing so.

I never liked TNT’s NFL coverage, but I remember a game sometime in the mid-90’s where the Jeff Blake-led Bengals came into Pittsburgh and whupped the Neil O’Donnel-quarterbacked Steelers. Blake was taking deep shots over and over again, while O’Donnell was playing very conservatively, despite trailing by 2+ scores for much of the game. One of the TNT announcers made a comment (it’s stuck with me because the TNT announcers rarely said anything intelligent): “The fact that Neil O’Donnell has zero interceptions in this game is totally irrelevant. He’s not giving his team a chance to win.”

• That’s a great quote. Any idea who it was?

• Wolverine

It was the color commentator, and perusing Wikipedia, it looks like TNT’s color commentator was Pat Haden.

• It was October 10, 1995. Blake had a perfect passer rating. And you were right, the commentator was Pat Haden.

Thanks, Wolverine. Joey Harrington is THE poster boy for impotent quarterbacking. He was the master of going 18/30 for 120 yards. While turnovers are more dramatic, a parade of 3-and-outs is just as damaging and even more exasperating to watch. I also got this hopeless feeling from Kyle Boller, Jason Campbell, Alex Smith, and Sam Bradford as you mentioned.

The PYA calculation. I think I get thrown off by the RPY/A and Value. Just not familiar with it. That’s probably all it is.

PY/A is just yards per attempt with a 20 yard TD bonus.

RPY/A is PY/A compared to league average on a per attempt basis.

VALUE is RPY/A multiplied by the number of attempts. It’s meant to balance efficiency and volume.

League average PY/A generally hovers between 7.5 and 8.0, although last season it was 8.1. Hope that clarifies things.

Since the Browns rejoined the league, they’ve had 39 QB seasons with at least 10 dropbacks. Of those, 32 provided negative value. Ouch.

Their best season by far was Anderson’s ’07 at +318, followed by Kelly Holcomb’s ’04 at +187 and Holcomb’s ’02 at +154. Couch in ’00, Dilfer in ’05, and Hoyer in ’14 posted barely positive seasons, but that’s it.

Anderson’s dismal ’09 season rates at -479 on just 188 attempts. Yikes! His ’08 wasn’t much better at -398. Talk about a one season wonder.

In 2013, Brandon Weeden posted a -231 and Jason Campbell registered a -309…on the same team.
In 2000, Spergon Wynn posted a -240 on only 54 attempts. Jake Delhomme flubbed his way to -264 in 2010.

In 2008, Ken Dorsey was worth -335 on just 91 attempts, while Quinn and Gradkowski chipped in -130 and -136 for a team total of -602.

Of course 2011 takes the cake. Colt McCoy’s dizzying -736 is the headliner, but Seneca Wallace added -264 for an even -1000 as a team!

• Have you looked at correlation between PY/A in years n and n+1? How about following Chase’s lead and looking at its correlation to wins? I like this “potency” metric, and I’d like to see it examined inside and out like ANY/A has. You may be way ahead of me on this, I dunno.

Thanks, Bryan. I was thinking about running some correlations, but haven’t gotten around to it yet. I think both of your suggestions would be useful in determining the true value of PY/A. If I had to guess, I bet ANY/A correlates better with past wins because it includes interceptions, but PY/A might correlate better with future wins because more of the noise is filtered out.

A follow up post may be in order 🙂

As a Packers fan growing up in the 1980s, I have to say that if you take away interceptions and sacks, Lynn Dickey was about the best QB in the league. Big arm, accurate on downfield throws, and completely immobile.

• I think sack% tells us something if the sample size is big enough. Players like Marino and Manning obviously meet the Y/A and TD% threshold. But they’re also incredible at avoiding sacks. So it’s not like they’re just being overly cautious at the expense of gaining yards. Though I saw something on 538.com (I think) that suggested that Aaron Rodgers’ low INT% could be a result of him being overly cautious late in close games. So yeah, sack% and INT% on their own aren’t great indicators.

• Wolverine

If I were an Eagles fan, the frequency with which Sam Bradford shows up on the second list would give me major palpitations.

I know, right? Given his high salary, I can hardly think of a worse choice to be your starting QB. Maybe Chip Kelly is going to run the ball 60% of the time, who knows with him…

I like the idea of removing the negative stuff, keeping the positive stuff, and seeing how things shake out (I’m assuming that’s the general idea of what you’ve done; honestly, Danny’s blog post was waaaaay over my head).

Couldn’t help quickly running some Super Bowl numbers:

Top 5 ANY/A Super Bowl Performances (not adjusted for era or opponent)
1. Plunkett, 1980, 14.5
2. Montana, 1989, 12.6
3. Simms, 1986, 12.4
4. Aikman, 1992, 11.3
5. Williams, 1987, 11.2

Top 5 PY/A Super Bowl Performances (again, not adjusted for era or opponent), number in parentheses is ANY/A rank
2. Plunkett, 1980, 15.2 (#1)
3. Williams, 1987, 13.4 (#5)
4. Wilson, 2014, 13.0 (#16)
5. Montana, 1989, 13.0 (#2)

Two big shifts – Bradshaw’s performance against the Rams in 1979 gets a HUGE boost by us forgetting that he threw 3 picks in that game, Russel Wilson gets a similar boost by ignoring his three sacks and one INT (of course, if we adjusted this for era, he’d probably be a bit lower).

Of course, this leads us down the road of maybe just tweaking the interception yards penalty for ANY/A, etc.

In any event, great post and definitely not a waste of time. Thanks Adam!

• Adam, I hope you’ll forgive me a very long response. This is an interesting and important idea. Danny’s work was tough for me to follow, but the summary is straightforward enough.

Broadly speaking, I think there’s something to the findings. I’ve always remembered a quote, and to my great frustration I think the source for this is gone to the winds of time (though I may have read it in a Peter King MMQB from ’02 or so) … around that time, Mike Holmgren told Matt Hasselbeck to “throw some interceptions.” Hasselbeck was too worried about making mistakes, and Holmgren needed him to open things up and take some chances. He did, and the Seahawks became a perennial playoff team. Along the same lines, Frank Gifford was quoted as saying, “All venturesome running backs fumble.” Creating positive plays is more important than avoiding negative ones. I entirely agree with that. You and I are very much on the same page about the young quarterbacks in the league today. The tone of that piece is very sympathetic to the ideas you and Danny were working with.

Reading this, the player who immediately came to my mind was Jason Campbell. I lived in Washington for part of his tenure there, and Campbell was so conservative, so afraid of making a mistake, that he rarely accomplished anything. Bill Walsh had that famous quote about Steve DeBerg, that he was just good enough to get you beat. DeBerg’s comp% was high, but his yds/att was low. His sack% was good and his INT% was average, but his TD% was low. DeBerg’s yards/comp was the lowest of that era — by a lot, half a yard. The link doesn’t include Y/C, but DeBerg checks in at 11.3. The mean was 12.7, the median was 12.8, and the standard deviation was 0.7 (n=25). DeBerg is two standard deviations below the middle of the group, and about 2/3 standard deviation beneath the next-lowest (Tommy Kramer, 11.8).

One of my low-priority projects is convincing the world that yards per completion is a valuable stat. Show me a quarterback who completes a five-yard pass on 3rd-and-7, and I’ll show you an overrated player. Yards per completion is a degree-of-difficulty metric. Any QB on an NFL roster can complete a bunch of four- and five-yard passes if he doesn’t care about generating first downs. But a player who generates big plays is creating something for his team. I would add Y/C to your list of qualifying criteria. Obviously yds/att and yds/comp are highly correlated, but a guy who completes a high percentage of very short passes can fool the stats. QBs with good ANY/A and low Y/C are usually “system players” with middling true talent levels.

With good people around him, a mediocre QB can throw short passes, limit his risk, and put up decent efficiency numbers just by completing a high % of passes and avoiding INTs. Players who don’t stretch the field vertically can’t lead big comebacks, and they can’t beat good defenses. They pad their stats in low-leverage situations and leave us wondering why a guy with such good numbers has so little success. I don’t believe the reverse is true: a QB with above-average stats and high Y/C is almost invariably a pretty good player.

So I think you and Danny have a good basic idea, that there’s not enough emphasis on positive plays/qualifying criteria. But I don’t agree with lumping sacks and interceptions together. Sacks and INTs often balance out — a player who eats sacks under pressure usually doesn’t rush passes into coverage and get picked, while a player who unloads the ball under pressure seldom takes a lot of sacks — but I think that hides a fundamental flaw in this approach.

Danny’s idea that AY/A might be a better stat than ANY/A seems totally crazy to me. Maybe his research revealed something that I just don’t understand — my football knowledge is much greater than my math skill — but that strikes me as ridiculous. I would love a layman’s explanation of why sack data isn’t terribly important; convincing evidence would force me to radically re-evaluate my approach to quarterback analysis.

I think a lot of analytic fans put too much faith in ANY/A. Some people treat the stat as perfect, the end of the story. That’s foolish, but even beyond that, ANY/A can be fooled, and in particular it can be fooled in the way you discuss: by rewarding mistake-avoidance without enough emphasis on positive production.

That said … PY/A is an interesting idea, but I don’t understand how omitting data will improve ANY/A. Sacks and interceptions are important, and a quarterback does have some control over them. Any research that finds otherwise is missing something. Ideally, we’d have more context for these events. A sack on third down is seldom a big deal. A long interception on third or fourth down probably won’t cost your team the game. But a sack or a turnover on first down matters a great deal. Getting intercepted on a Hail Mary at the end of the first half shouldn’t meaningfully affect our evaluation of a passer, and so on. If we could break down the data with appropriate context, it would be extremely valuable, a necessary component of informed analysis.

Absent an easy way to put all our data into that context, it’s tempting to filter out the noise by excluding it altogether. I think that’s a mistake. Granting that I don’t understand Danny’s process, I do understand that as a general rule, good quarterbacks make fewer negative plays than bad quarterbacks. They take fewer sacks, and they throw fewer interceptions. Including that data, and working around the imperfections, seems like a better idea to me than excluding those data. I think you have a good idea, and moreover, a really interesting good idea. But I think ANY/A is an superior statistic to PY/A.

“But a player who generates big plays is creating something for his team.” Btw, Brian Hoyer was 1st in the league in Y/C

I’ve been too distracted reading comments on your articles so I somehow missed this one until now…

Let me start by saying Danny’s conclusions don’t entirely make sense to me, either. Taking sacks and throwing interceptions are obviously bad in a general sense, so it doesn’t compute that ignoring them completely would make ANY/A more accurate. But at the same time I believe that ANY/A systematically overrates pedestrian passers who don’t really contribute to winning. Fitting you mention Jason Campbell, because he was my inspiration for exploring this idea in the first place.

I agree that negative plays are begging for context, and in recent years I think ESPN’s QBR has done the best job of incorporating said context. Unfortunately for people interested in historical comparisons like you and I, precise data simply doesn’t exist for most of NFL history.

In that vein, my goal is to create a better version of ANY/A, and this article was essentially a brainstorming session in that direction. Perhaps ANY/A would be improved by lowering the INT penalty without discarding it entirely, or making the formula nonlinear in the way the components are weighted. I wholeheartedly agree with you on the merits of Y/C, so I would give an exponential penalty for Y/C below a certain threshold. This would bring down the ratings for the checkdown artists like Campbell no matter how well they did in other areas. Similarly I would give an exponential bonus for QB’s above a threshold for Y/C provided they maintain at least an average comp %. Another idea regarding INT’s…penalize them only above or below one standard deviation from league average. Everyone within one SD would fall into the randomness zone and rate the same.

I may tinker with some of these ideas and write a follow up article. Thanks for your input and willingness to explore out-of-the-box ideas, it’s more fun that way!

Great piece – It does tie in with the belief
many people have, including myself, that the chances an over aggressive QB will
learn how to cut down on his mistakes are higher than the ones that an
overly cautious, captain checkdown type of QB will learn to take the necessary
risks/air-it-out.

Now as an Eagles fan, I go back to my daily