But you rarely see Luck’s completion split into (a) 10 yards through the air, and (b) 15 yards after the catch by his receiver. Brian Burke calls those 10 yards “Air Yards” and I think that is a pretty useful moniker. The question is, what do you do with Air Yards? Luck led the NFL in Air Yards per completed pass last year (8.0), but that doesn’t make the statistic an indicator of quality. Tim Tebow’s 2011 performance produced the highest single-season Air Yards per completion average since 2006 (8.9), while Jake Delhomme (2008) and Derek Anderson (2010) each have led the league in that metric, too. Air Yards per completed pass is a very useful way to describe a player’s style, but you can’t use it alone to determine a player’s quality.

One question I have: Are Air Yards more repeatable for a quarterback than the yards he gains via his receivers’ YAC? It’s important to keep that question separate from this one: Is a quarterback who has a high number of air yards and a low YAC better than a quarterback in the opposite situation? Today, I plan to focus on the first question, but let’s take a second to address the second one.

According to ESPN’s research, yards after the catch is more about what the receiver does than the quarterback. As a result, a completion that is in the air for 40 yards is better for a quarterback’s ESPN QBR than a pass that is in the air for 5 yards on which the receiver runs for 35 yards after the catch. That makes sense, I suppose, and I suspect that’s probably true more often than not. The easiest counterargument is to point to Joe Montana, and say that what made Montana great was his pinpoint accuracy that enabled players like Jerry Rice to rack up big YAC numbers.I’m going to put off any further analysis of how much of YAC should be attributed to the quarterback and how much to the receiver, because it’s pretty complicated. One thing that is a bit easier to analyze is how “sticky” Air Yards are from year to year.

Examining data from the NFL’s Game Statistic and Information System (NFLGSIS), there have been 100 quarterbacks from 2006 to 2012 who threw at least 290 passes for the same team in consecutive years. The **correlation coefficient between Air Yards per completed pass** in Year N and in Year N+1 was 0.26. You might get a sense that such a relationship is really weak, but if you’re unfamiliar with CCs, let’s look at the same set of 100 quarterbacks and how “sticky” their other metrics are:

- The correlation coefficient between
**Completion Percentage**in Year N and in Year N+1 was 0.51. - The correlation coefficient between
**Yards/Attempt**in Year N and in Year N+1 was 0.51. - The correlation coefficient between
**Touchdown Rate**in Year N and in Year N+1 was 0.34. - The correlation coefficient between
**Interception Rate**in Year N and in Year N+1 was 0.08. - The correlation coefficient between
**Air Yards on all passes**(i.e., including the distance of the throw on incomplete passes) in Year N and in Year N+1 was 0.34. - The correlation coefficient between
**Yards after the Catch**in Year N and in Year N+1 was 0.34.

So what does that mean? A high correlation coefficient doesn’t mean the stat is more important, just more predictive. We shouldn’t be surprised to see that completion percentage is consistent from year to year. Completion percentage is not sensitive to outlier plays and is calculated over a very large sample size. Of course, that doesn’t make completion percentage very important, just easy to predict.

On the other hand, it is a little surprising that Air Yards per completed pass isn’t stickier from year to year. Here’s another way to show the lack of year-to-year correlation. I looked at the top 20 and bottom 20 quarterbacks in AY/CP in my 100-sample set. The top 20 quarterbacks averaged 7.9 Air Yards per Completed Pass in Year N and 6.9 AY/CP in Year N+1; the bottom 20 quarterbacks averaged 5.7 AY/CP in Year N and then 6.4 AY/CP in Year N+1. In other words, the gap between the best and worst quarterbacks drops from about 2.2 to 0.5 from year to year. That’s a little counter intuitive to me, as I would think coaching philosophies don’t change much from year to year.

Carson Palmer, 2011-2012, had the biggest dropoff in Air Yards per Completed Pass, dropping from 8.55 in 2011 to 5.97 in 2012. Part of that was due to injuries, but in 2012, TE Brandon Myers and FB Marcel Reece led the team in receptions while deep threat Darrius Heyward-Bey led the team in receptions in 2011. Denarius Moore also saw his yards per reception drop from 18.7 to 14.5, although it’s hard to know whether Palmer was the cause or the effect there. Under Bruce Arians, Andrew Luck led the league in AY/CP in 2012; it will be very interesting to see how Palmer does under Arians’ tutelage in 2013, and if Palmer can regain his 2011 form (which was only over 9 games).

On the other hand, Josh Freeman 2011-2012 saw the biggest increase in AY/CP, jumping from 5.61 in 2011 to 7.92 in 2012. Adding Vincent Jackson (19.2 YPR in 2012) played a big part, and I discussed the impact of offensive coordinator Mike Sullivan for on Freeman back in November. Freeman ranked 3rd to last in AY/CP in 2011 (ahead of only Ryan Fitzpatrick and Blaine Gabbert), which was a gross misuse of his talents. Of course, he was also playing for a Tampa Bay team that mailed it in the final two months of that season.

Interestingly, average YAC has a higher correlation year-to-year than average AY/CP. For that matter, so does Air Yards per Attempt, which includes incomplete passes. That statistic is what Mike Clay calls “average depth of target” or aDOT; I’m not surprised about the latter, as we simply have a larger sample, making the results more reliable in any one year. As for the former, it’s hard to say. The difference isn’t very big and our sample isn’t large, so it could just be random. Initially, I would assume that AY/CP would be stickier than average YAC, but if you were the type of person to argue that good quarterbacks consistently throw accurate passes that enable their receivers to pick up yards after the catch, you’d be happy to see this result.

The best-fit formula to predict future AY/CP in Year N+1 is 5.1 + 0.23*Yr N AY/CP. The number to focus on there is the coefficient of 0.23: this means if one quarterback averages one more AY/CP than another quarterback in Year N, the difference is projected to shrink to just 0.23 AY/CP the next season. For Year N+1 YAC, it’s 3.5 + 0.31*Yr N-1 YAC. So for every one additional yard of YAC in one year, we expect 0.31 additional yards of YAC per completion the next season.

Again, we can’t gain too much insight into a sample of only 100 quarterbacks, but these two formulas do indicate that YAC may be easier to predict for quarterbacks than Air Yards. At least to me, that’s a surprising result.

{ 25 comments… add one }

Tangential, but any reason you chose 290 as your cutoff for attempts? That something you always do in QB analyses? Haven’t really paid attention to it before, but the sample size issue here made me think, “Why not just use the NFL’s 224 qualifying total?” Probably won’t change much, so just curious.

That’s the number that got me to an even 100 QBs. I think I started with 224, but there were some guys in there that didn’t feel like true starters. I played around with cutoffs, got to about 103, and then just said I’ll look at the 100 QBs with the most passes.

I suspect you’d get somewhat different and maybe more elucidative results grabbing random splits from across the sample or at least splitting attempts into even/odd instead of cutting it at years.

It would help to control for opponent and allow a better sense of how much of air yards is noise versus team/QB skill.

Of course it wouldn’t say as much about how well the past year’s data can be projected forward as one would like but then neither do yearly splits.

Also, I totally should have phrased it “any reason you didn’t…” because I’m sure it’s something you considered and I didn’t mean to come across as implying you didn’t

Unfortunately, I currently only have the yearly results, but I might be able to get game logs. That’s a good idea.

Have you sorted this by offensive coordinator (or coach, in the cases where they call plays)? This statistic seems like it is very offense specific (i.e. Arians is known for a deep passing game, Tebow was almost always deep throwing against single coverage because of the strong running game, etc.).

However, within a single system (as best as we can approximate that without lengthy tape analysis), I would think not only would air yards be stickier they would be a better indicator of “quality.” I put quality in quotes because I can imagine explanations for why a quarterback would make shorter throws (this could be sorted for by seeing if a drop off in air yards roughly corresponded to an increase in y.a.c.), but if the system is the same the quarterback presumably is going to be throwing for as many yards as he can.

For instance, I would think air yards would be an interesting way to examine claims of a quarterback’s arm strength declining. Peyton Manning was throwing accurate balls all year, but towards the end of the year, the cold playoff game in particular, it seemed he was throwing deep less often and air yards could easily verify this.

As usual with football stats, the usefulness of the stat is going to come down to an ability to eliminate the noise caused by the other moving parts, but in this case I think the other main component, play calling, is something we can at least begin to isolate.

P.S. Rather than trying to identify playcallers what about using adot as a system definer? This might allow getting around the problems of changing quality in quarterbacks. Presumably, a poor quarterback is going to pull the trigger on just as many deep throws as he is told to as a good quarterback, the good quarterback is just going to complete more of those throws.

I didn’t break it down by OC or coach — that should be pretty easy for me to do, although it will only serve to shrink the sample size.

I agree that air yards works well for things like arm strength. Manning was at 7.0 in the regular season, 6.1 in the one playoff game. In 2010, he was at 6.3, but he was struggling with neck issues then. Going back more, he was at 6.5 in ’09, 6.6 in ’08, 7.6 in ’07, and 8.3 in ’06.

I’m not sure I’m following you on your last point.

Great blog by the way.

I really would like to see the comments’ section on the blog become more active as well. I have read some great comments on here, but there just aren’t very many of them.

I would like to see Air Yards and YAC correlations for QBs who change teams and teams who change QBs midseason. That might tell you how well these stats can be attributed to the QBs and not their receivers or coaches. Jason Lisk did this for conventional QB stats a few years ago on the PFR blog.

It might be interesting to take a look on a case-by-case basis, but the sample size is going to be tiny.

I think the other main component, play calling, is something we can at least begin to isolate.

Any impact on the correlation if you do Air Yards per attempt if you make it 0 yards for all incompletes? So the total air yards will be the same, just the divider will be all attempts, rather than all completions.

I dunno which way it will go: the raw numbers will be way down, and there won’t be as much separation from one guy to the next, which in turn makes one outlying deep pass even more likely to skew the stats. But then it also intrinsically factors in completion percent as well. Back of a fag packet maths tells me that’d probably cut in half the air yards stat for someone like Tebow or Anderson, but not do it so much for guys who can throw downfield effectively, but also throw actual other NFL passes with some semblance of accuracy.

I guess there’s a decent chance that it’ll be not noticeably more correlative than the others, but it seems like the only metric that you’ve not included here.

Also, just a thought the lack of correlation for Air yards year on year. Could it just be down to deep passes?

Based off the table in the link at the top (labelled “Air Yards”), it seems fair to assume that league wide there’s a 55:45 split between air yards and YAC. Lets say the average QB passes for 4000 yards (which seems a little high, but close enough to be in the ballpark). 55% of that is 2200 air yards. Lets say a deep pass is 40 yards. By my crappy maths, if we say 2200 yards is an average amount of air yards for a QB, then each 40 yard completion would be 1.8% of the total yards. At what number does hitting on deep passes become statistically significant in terms of correlation? I mean, if you hit on 3 more than last year you’ve just boosted your air yards by over 5%. 5 more gets you to nearly 10%.

I dunno, I’m not a maths guy, that’s just the thought that immediately jumped out at me as a possible reason when I was trying to figure why air yards per completion wouldn’t correlate from year to year.

I think there’s something to that. Air Yards is sensitive to outliers, although not necessarily more so than yards per attempt is generally. But I agree that the regression comes from the fact that you can’t get a “true” rating from only a few hundred completed passes.

Without knowing the exact steps you took when running your regression, I have to say that whenever you try to run regressions with the same units across time, there are all sorts of robustness issues you have to overcome. Given that, it may not be very informative to look at R2. Not sure this is feasible, but it would be interesting to see how strong the fit statistics are if you were to look at air yards across all 16 games.

I ran a very simple regression. To predict Year N+1 Air Yards per Completed Pass, I used only one input: Year N Air Yards per Completed Pass.

You’re going to get biased results. For one, a simple model will ignore the fact that the errors are likely(and probably are) correlated across time.

I’m not following you, Ajit. Care to unpack that a little bit?

Sure. A basic linear regression model equation takes the form, Y = Bo + B1*X + e, where Y is the dependent variable, B0 is the intercept, B1 is the regression parameter(matrix) for X covariate(s) and e is the error term. Now, for this model to be the best linear unbiased estimator, the error term must be uncorrelated with X. Or, in other words, the error for one unit is not related to the error in another unit. Now, the regression you are trying to run looks like this:

Yi,t = Bo + B1*Yi,t-1 + ei,t.

In this instance, we are regressing say Josh Freeman’s value at T2 with Josh Freeman’s value at T1. However, in this situation, it is likely that error is correlated across the two values because both values are plausibly affected by the same error.

This is essentially called autocorrelation. http://carecon.org.uk/UWEcourse/epa/Basicproblems.pdf

Thanks.

When you say that both values are affected by the same error, what do you mean?

That’s a good link. Any thoughts on a solution?

Say you have two data series, X and Y. X has some underlying randomness about it. Y has some underlying randomness to it as well. We assume the randomness in Y is not related to the randomness in X. If they are, then there’s a problem.

There’s a few different things you can do, but the problem becomes more complicated.

In what way do you think the randomness in Josh Freeman’s 2012 season is related to the randomness in Josh Freeman’s 2011 season? And to the extent you think autocorrelation is a problem here, wouldn’t that have more of an effect on the magnitude of the results than the order (i.e., I don’t see why it would be more of a problem with Air Yards than completion percentage or YAC).

Its not specifically related to Freeman. Autocorrelation tends to happen when you are looking at data over a period of time. How and to what degree autocorrelation biases results is hard to know. It could be large or small, and it could affect YAC more than Air yards or vice versa.

In thinking about this, there are some other issues too, namely uneven variances within the composition. So for instance, you took a composite of 100 qbs and looked at their year to year changes in Air Yards. A simple linear regression will assume that the relationship between the two years is more or less the same across all data points. For instance, if we were to look at how much income is correlated with experience, we would expect that this relationship is the same across all individuals. However, that assumption probably doesn’t hold, since males typically make more than women, whites make more than blacks, etc etc. Similarly, some qbs in some systems may have more consistent Air yards than others for a variety of reasons, but OLS will treat this composition the same.

I’ll just want to add, I like this article a lot and the things I note aren’t meant to invalidate the results at all. I think I was just being a bit nit picky.

Autocorrelation is a pretty standard problem in regression, key is not just running the regression and going home. One needs to assess the errors.