Previous post:

## Yards per Carry, Net Yards per Attempt, and Regression to the Mean

Last week, I wrote about why I was not concerned with Trent Richardson’s yards per carry average last season. I like using rushing yards because rush attempts themselves are indicators of quality, although it’s not like I think yards per carry is useless — just overrated. One problem with YPC is that it’s not very stable from year to year. In an article on regression to the mean, I highlighted how yards per carry was particularly vulnerable to this concept. Here’s that chart again — the blue line represents yards per carry in Year N, and the red line shows YPC in Year N+1. As you can see, there’s a significant pull towards the mean for all YPC averages.

I decided to take another stab at examining YPC averages today.  I looked at all running backs since 1970 who recorded at least 50 carries for the same team in consecutive years. Using yards per carry in Year N as my input, I ran a regression to determine the best-fit estimate of yards per carry in Year N+1. The R^2 was just 0.11, and the best fit equation was:

2.61 + 0.34 * Year_N_YPC

So a player who averages 4.00 yards per carry in Year N should be expected to average 3.96 YPC in Year N+1, while a 5.00 YPC runner is only projected at 4.30 the following year.

What if we increase the minimums to 100 carries in both years? Nothing really changes: the R^2 remains at 0.11, and the best-fit formula becomes:

2.63 + 0.34 * Year_N_YPC

150 carries? The R^2 is 0.13, and the best-fit formula becomes:

2.54 + 0.37 * Year_N_YPC

200 carries? The R^2 stays at 0.13, and the best-fit formula becomes:

2.61 + 0.36 * Year_N_YPC

Even at a minimum of 250 carries in both years, little changes. The R^2 is still stuck on 0.13, and the best-fit formula is:

2.68 + 0.37 * Year_N_YPC

O.J. Simpson typifies some of the issues. It’s easy to think of him as a great running back, but starting in 1972, his YPC went from 4.3 to 6.0 to 4.2 to 5.5 to 5.2 to 4.4. Barry Sanders had a similar stretch from ’93 to ’98, bouncing around from 4.6 to 5.7 to 4.8 to 5.1 to 6.1 and then finally 4.3. Kevan Barlow averaged 5.1 YPC in 2003 and then 3.4 YPC in 2004, while Christian Okoye jumped from 3.3 to 4.6 from 1990 to 1991.

Those are isolated examples, but that’s the point of running the regression. In general, yards per carry is not a very sticky metric. At least, it’s not nearly as sticky as you might think.

That was going to be the full post, but then I wondered how sticky other metrics are.  What about our favorite basic measure of passing efficiency, Net Yards per Attempt? For purposes of this post, an Attempt is defined as either a pass attempt or a sack.

I looked at all quarterbacks since 1970 who recorded at least 100 Attempts for the same team in consecutive years. Using NY/A in Year N as my input, I ran a regression to determine the best-fit estimate of NY/A in Year N+1. The R^2 was 0.24, and the best fit equation was:

3.03 + 0.49 * Year_N_NY/A

This means that a quarterback who averages 6.00 Net Yards per Attempt in Year N should be expected to average 5.97 YPC in Year N+1, while a 7.00 NY/A QB is projected at 6.45 in Year N+1.

What if we increase the minimums to 200 attempts in both years? It has a minor effect, bringing the R^2 up to 0.27, and producing the following equation:

2.94 + 0.51 * Year_N_NY/A

300 Attempts? The R^2 becomes 0.28, and the best-fit formula is now:

2.94 + 0.53 * Year_N_NY/A

400 Attempts? An R^2 of 0.26 and a best-fit formula of:

3.18 + 0.50 * Year_N_NY/A

After that, the sample size becomes too small, but the takeaway is pretty clear: for every additional yard a quarterback produces in Year N, he should be expected to produce another half-yard in NY/A the following year.

So does this mean NY/A is sticky and YPC is not? I’m not so sure what to make of the results here. I have some more thoughts, but first, please leave your ideas and takeaways in the comments.

• Off the top of my head, a few thoughts:

1) Not that it’s going to change anything, but knowing you like ANY/A for prediction, I’m curious as to why you chose NY/A as the passing stat to measure “stickiness.” Doesn’t seem like it was because of wanting to use a stat parallel to RB YPC (i.e., that would be Y/A, not NY/A).

2) All of the different R^2’s based on various attempt thresholds is a nice example of how “discretizing” a continuous variable (i.e., turning a stat on a continuum like “attempts” into a dichomotous “above X or below X” grouping is generally folly.

3) There’s no need to run a regression when you only have 1 predictor. You’ll notice that the R^2 is equal to the square of the “Year N-1” coefficient in the model. Essentially, that coefficient is the value of the correlation (i.e., “r”) between Year N-1 and Year N, so R^2 = r^2.

3) Yeah, based on what you’ve presented here, NY/A seems very likely to be stickier than YPC.

• Unless he has changed his mind lately, he prefers NY/A for prediction. I cannot remember the entire article, but I distinctly remember a point when he said said, essentially, “My favorite explanatory stat is ANY/A. My favorite predictive stat is NY/A.”

(Hopefully, that doesn’t come across as trying to speak for Chase, which I guess it rather is.)

Because I don’t want to make three comments inside of an hour, I’m adding my minor thoughts on the article here.
My takeaway is that it’s another example of how running back production is more situation-dependent than quarterback production.

• Found it:

“That’s why when looking at which quarterback will perform the best in the future, NY/A is my favorite statistic. When analyzing past quarterbacks, I prefer Adjusted Net Yards per Attempt, which gives a 45-yard penalty for interceptions and a 20-yard bonus for touchdowns. That’s more useful as an explanatory statistic than NY/A, but is not as helpful in predicting the future.”
From: http://www.footballperspective.com/correlating-passing-stats-with-wins/

I knew about when it was because I remembered reading it when I was waiting to start a practice bar exam. I have a bizarre memory.

• Heh. I blame my advancing age (or alphabet soup) for getting Chase’s view on them backwards. Thanks for the clarification, sir.

• James

Since NY/A has been improving over time, while YPC has been fairly constant, what would the results be if you used NY/A+ for the regression and then converted back to NY/A using 2012’s mean and stdev? It might do nothing since we are comparing N and N+1 so the improvement over time will be mitigated, but it might make the higher NY/A’s of today seem less extreme and therefore stickier in the regression.

Also, I think it makes perfect sense that NYA is more consistent than YPC for two reasons: 1. Sample size each season for QBs is higher so there will be less noise, and 2. QBs can choose how far downfield they throw the ball, while RBs are always starting behind the line of scrimmage. That creates a bigger difference in style than RBs that reflects talent differences more.

• YPC is improving over time as well: http://www.pro-football-reference.com/blog/?p=6325

To update that post because it’s old,
2010: 4.21, 5-year avg: 4.22
2011: 4.29, 5-year avg: 4.24
2012: 4.26, 5-year avg: 4.26

• James

Huh, I didn’t know that. Interesting. In that case I say run a regression on YPC+ too!

• Red

NY/A is stickier because it has a higher mean and a much higher variance (per play), so it’s less affected by outlier plays than YPC. If you look at a hypothetical sample of 20 run plays vs 20 pass plays, it becomes more evident:

Sample run plays: -3, -1, -1, 0, 0, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6, 8, 13, 19, 80

Sample pass plays: -10, -7, 0, 0, 0, 0, 0, 0, 4, 6, 7, 9, 11, 15, 18, 22, 27, 28, 41, 80

Among the rushing plays, that single 80 yard run stands out like a sore thumb, and totally distorts the average. However, among the pass plays, that 80 yarder doesn’t really look that out of place, and it won’t affect the average much over a full season of 500+ attempts.