Regular readers know that I’m skeptical of using “yards per carry” to evaluate running backs. That’s because YPC is not very consistent from year to year. But it’s also not consistent even within the same year. For example, In 2013, Giovani Bernard rushed 92 times for 291 yards in even-numbered games last year, producing a weak 3.16 YPC average. But in odd-numbered games, Bernard averaged 5.18 YPC, rushing 78 times for 404 yards!
Jamaal Charles also showed a preference for odd-numbered games, averaging 5.80 YPC in games 1, 3, 5, etc., and only 3.96 YPC in even-numbered games. Buffalo’s C.J. Spiller had a reverse split, producing 5.57 YPC in even games and 3.61 YPC in odd games.
Okay, this stuff is meaningless, you say. Who cares about these random splits? Well, there are a couple of reasons to care. For starters, these splits serve as a great reminder that splits happen. If Spiller averaged 3.61 YPC in the first half of the year and 5.57 in the second half, the narrative would be that Spiller was finally healthy by the end of the year, and was set up for a monster 2014 campaign. Meanwhile, if Charles had seen his YPC fall from 5.8 YPC in the first eight games to 3.96 in the back eight, the narrative would be that he couldn’t handle a heavy workload, was breaking down, and could be a huge bust this year. Narratives are easy to invent, and remembering that “splits happen” is an important part of any analysis.
But there are other reasons these weird splits are useful to know. Because while the examples above are extreme, they’re representative of the general rule: YPC is not very sticky. From 1990 to 2013, there were 1,512 running backs who recorded at least 75 rushes in either even-numbered games (2, 4, 6, 8, 10, 12, 14, and 16) or odd-numbered games.
Suppose you know that a running back averaged 5.00 YPC in an eight-game subset. What would you project his YPC average for the other eight games? The answer isn’t anywhere near 5.00; in fact, it’s around 4.37. And a running back who averaged 3.5 YPC in one eight-game sample would be projected to averaged 3.92 YPC in the other eight games.
That’s because the best-fit linear formula to predict YPC using the above-mentioned data set is:
Future YPC = 2.875 + 0.300 * Prior_YPC (R^2 = 0.09)
If yards per carry isn’t sticky within a season, it certainly shouldn’t be expected to be consistent in future seasons, when roster turnover, coaching changes, and age can play significant roles. The fact that YPC is so inconsistent even within a single season is quite revealing. So what is sticky? Well, using the exact same data set, rushing yards is quite a bit stickier:
Future RushYds = 125.6 + 0.727 * Prior_RushYds (R^2 = 0.46)
Remember, we’re looking at eight-game samples here; this tells us we regress the rushing yards data we have, but only a little bit: 73% of the prior rushing yards is used to predict the future. So if a back has rushed for 700 yards in one 8-game sample, we project him to rush for 635 yards in the other eight-game split.
But, as careful readers have no doubt already figured out, rush attempts is by far the stickiest stat.
Future Rush Att = 11.3 + 0.884 * Prior_Rush_Att (R^2 = 0.64)
That’s an incredibly high “stickiness” rate of 88.4%. This makes sense, since rush attempts is not at all sensitive to outliers, but that’s still a shockingly-high retention rate. Now, let’s say we’re halfway through the season and you want to project a player’s number of future rushing yards. This analysis would suggest that you want to place much more emphasis on past rush attempts than anything else. How much emphasis?
Using Dan’s suggestion, I ran a regression using natural logs to predict future rushing yards from prior rush attempts and prior YPC. Since YPC = Rush Attempts * YPC, by definition, ln(YPC) = ln (RushAtt) + ln (YPC). So I tweaked that formula to take the logs of prior rush attempts and prior YPC to predict the log of future YPC. Here’s the best-fit formula:
ln(Future_Rsh_Yds) = 0.84 + 1.025 * ln(Prior_Rsh_Att) + 0.285 * ln(Prior_YPC) (R^2 = 0.49)
For each running back, after running their prior numbers through that equation, you get the natural log of future rushing yards. To convert that into future rushing yards, you simply take e^(ln(X)) to get X, where X is future rushing yards (in Excel, just type in Exp(A2), where A2 is the cell where you have your projection of the natural log of future rushing yards).
Let’s use a couple of examples. Suppose through eight games a running back has rushed 125 times and has averaged 4.0 YPC, giving him 500 yards. This formula would project him to have 486 yards in the other eight games. Now, if we drop his YPC from 4.0 to 3.0, his projected future rushing yards only drops to 448; increase it from 4.0 to 5.0, and it only increases his projected future rushing yards to 518. Those are tiny differences for what are viewed as significant swings in yards per carry.
Or, let’s use two hypothetical running backs with 560 yards through eight games. Let’s give RB A 100 carries and a 5.6 YPC average, and RB B 160 carries but a 3.5 YPC average. This tells us that RB A will be projected to drop from 560 to 425 yards in the second half of the year, but RB B should rise to 602 yards.1
One way to look at this is to say that carries are king; like routes run, they represent opportunities given to a player, which is a strong indicator of talent. Another interpretation is that YPC is a fickle mistress. It is extremely sensitive to outliers, and two or three carries can completely skew a player’s production. I’m not telling you anything you didn’t already know, but the data here seems overwhelming to me.
- In general, I like using even/odd splits instead of first half/second half splits to avoid biasing the results due to injuries and roster changes. So if half the offensive line was injured in the second half of the season, that makes a first half/second half split an apples-to-oranges comparison. Using even/odd splits gets around this issue, although there is a drawback. If, as identified in this example, RB A has a great YPC average in the first half of the year, it’s possible his workload will increase in the second half of the year. Similarly, RB B might see a decrease in workload. However, I’m not overly concerned with this issue, although it might be worth studying in a future post. Still, since YPC is so inconsistent from period to period — and there’s not much evidence that coaches seem to care much about this statistic — I don’t think switching to a chronological split would change things. But hey, this just gives me another topic for a future post! Let me know your thoughts. [↩]