Regular readers know that I’m skeptical of using “yards per carry” to evaluate running backs. That’s because YPC is not very consistent from year to year. But it’s also not consistent even* within the same year*. For example, In 2013, Giovani Bernard rushed 92 times for 291 yards in even-numbered games last year, producing a weak 3.16 YPC average. But in odd-numbered games, Bernard averaged 5.18 YPC, rushing 78 times for 404 yards!

Jamaal Charles also showed a preference for odd-numbered games, averaging 5.80 YPC in games 1, 3, 5, etc., and only 3.96 YPC in even-numbered games. Buffalo’s C.J. Spiller had a reverse split, producing 5.57 YPC in even games and 3.61 YPC in odd games.

Okay, this stuff is meaningless, you say. Who cares about these random splits? Well, there are a couple of reasons to care. For starters, these splits serve as a great reminder that splits happen. If Spiller averaged 3.61 YPC in the first half of the year and 5.57 in the second half, the narrative would be that Spiller was finally healthy by the end of the year, and was set up for a monster 2014 campaign. Meanwhile, if Charles had seen his YPC fall from 5.8 YPC in the first eight games to 3.96 in the back eight, the narrative would be that he couldn’t handle a heavy workload, was breaking down, and could be a huge bust this year. Narratives are easy to invent, and remembering that “splits happen” is an important part of any analysis.

But there are other reasons these weird splits are useful to know. Because while the examples above are extreme, they’re representative of the general rule: YPC is not very sticky. From 1990 to 2013, there were 1,512 running backs who recorded at least 75 rushes in either even-numbered games (2, 4, 6, 8, 10, 12, 14, and 16) or odd-numbered games.

Suppose you know that a running back averaged 5.00 YPC in an eight-game subset. What would you project his YPC average for the other eight games? The answer isn’t anywhere near 5.00; in fact, it’s around 4.37. And a running back who averaged 3.5 YPC in one eight-game sample would be projected to averaged 3.92 YPC in the other eight games.

That’s because the best-fit linear formula to predict YPC using the above-mentioned data set is:

Future YPC = 2.875 + 0.300 * Prior_YPC (R^2 = 0.09)

If yards per carry isn’t sticky within a season, it certainly shouldn’t be expected to be consistent in future seasons, when roster turnover, coaching changes, and age can play significant roles. The fact that YPC is so inconsistent even within a single season is quite revealing. So what is sticky? Well, using the exact same data set, rushing yards is quite a bit stickier:

Future RushYds = 125.6 + 0.727 * Prior_RushYds (R^2 = 0.46)

Remember, we’re looking at eight-game samples here; this tells us we regress the rushing yards data we have, but only a little bit: 73% of the prior rushing yards is used to predict the future. So if a back has rushed for 700 yards in one 8-game sample, we project him to rush for 635 yards in the other eight-game split.

But, as careful readers have no doubt already figured out, **rush attempts** is by far the stickiest stat.

Future Rush Att = 11.3 + 0.884 * Prior_Rush_Att (R^2 = 0.64)

That’s an incredibly high “stickiness” rate of 88.4%. This makes sense, since rush attempts is not at all sensitive to outliers, but that’s still a shockingly-high retention rate. Now, let’s say we’re halfway through the season and you want to project a player’s number of future rushing yards. This analysis would suggest that you want to place much more emphasis on past rush attempts than anything else. How much emphasis?

Using Dan’s suggestion, I ran a regression using natural logs to predict future rushing yards from prior rush attempts and prior YPC. Since YPC = Rush Attempts * YPC, by definition, ln(YPC) = ln (RushAtt) + ln (YPC). So I tweaked that formula to take the logs of prior rush attempts and prior YPC to predict the log of future YPC. Here’s the best-fit formula:

ln(Future_Rsh_Yds) = 0.84 + 1.025 * ln(Prior_Rsh_Att) + 0.285 * ln(Prior_YPC) (R^2 = 0.49)

For each running back, after running their prior numbers through that equation, you get the natural log of future rushing yards. To convert that into future rushing yards, you simply take e^(ln(X)) to get X, where X is future rushing yards (in Excel, just type in Exp(A2), where A2 is the cell where you have your projection of the natural log of future rushing yards).

Let’s use a couple of examples. Suppose through eight games a running back has rushed 125 times and has averaged 4.0 YPC, giving him 500 yards. This formula would project him to have 486 yards in the other eight games. Now, if we drop his YPC from 4.0 to 3.0, his projected future rushing yards only drops to 448; increase it from 4.0 to 5.0, and it only increases his projected future rushing yards to 518. Those are tiny differences for what are viewed as significant swings in yards per carry.

Or, let’s use two hypothetical running backs with 560 yards through eight games. Let’s give RB A 100 carries and a 5.6 YPC average, and RB B 160 carries but a 3.5 YPC average. This tells us that RB A will be projected to drop from 560 to 425 yards in the second half of the year, but RB B should rise to 602 yards.^{1}

One way to look at this is to say that carries are king; like routes run, they represent opportunities given to a player, which is a strong indicator of talent. Another interpretation is that YPC is a fickle mistress. It is extremely sensitive to outliers, and two or three carries can completely skew a player’s production. I’m not telling you anything you didn’t already know, but the data here seems overwhelming to me.

- In general, I like using even/odd splits instead of first half/second half splits to avoid biasing the results due to injuries and roster changes. So if half the offensive line was injured in the second half of the season, that makes a first half/second half split an apples-to-oranges comparison. Using even/odd splits gets around this issue, although there is a drawback. If, as identified in this example, RB A has a great YPC average in the first half of the year, it’s possible his workload will increase in the second half of the year. Similarly, RB B might see a decrease in workload. However, I’m not overly concerned with this issue, although it might be worth studying in a future post. Still, since YPC is so inconsistent from period to period — and there’s not much evidence that coaches seem to care much about this statistic — I don’t think switching to a chronological split would change things. But hey, this just gives me another topic for a future post! Let me know your thoughts. [↩]

{ 25 comments… add one }

Your work is awesome my friend.

Thanks Arthuro. This is a corner of the internet where the comments are generally pretty awesome.

Is yards above average run predictive?

Yards – (carries * 2)

YPC can be traded for more carries.

I’m not sure I’m following what you’re saying, Nick.

Sorry I was mobile; I remember a stat being floated around that attempted to measure quality rushes — not sure if it was here or elsewhere. And basically, it was yards above replacement, with a replacement carry being 2 yards. so, if a RB rushed for 100 yards on 20 carries, he’d have a score of 60. we set our fantasy football league up like this, with -0.5 points per carry and 1 point per 4 yards rushing. As a result, a rush att costs you points if it goes for less than two yards (unless its a goal-line TD) — first down points are in this year too, so that should alleviate the penalty on short yardage backs.

What I was looking for was a comparison on that stat between year n and n+1.

For reference, here were the top 10 rushers in that stat category in 2013:

Player Team yds-(car*2)

L.McCoy PHI 979

J.Charles KC 770

M.Forte CHI 765

A.Morris WAS 723

A.Peterson MIN 708

D.Murray DAL 690

R.Mathews SD 685

M.Lynch SEA 655

E.Lacy GB 610

F.Gore SF 576

Yeah, I did do that once. I thought it was kind of neat, although I really had no good justification for using 2 yards instead of another number.

well its about half the average carry.

What about looking at something not so biased by outliers–say median?

When I’ve looked at that in the past, just about every running back had a median carry of 3 yards.

You can cut off the tails of the distribution — say, YPC after you lop off all carries that are greater than 1.5 sig. from mean.

If the 50% split isn’t very informative what about the 75% (or maybe 25%)? One would expect the better, more explosive backs to shine somewhere in the distribution.

True. Although this involves getting into the nitty gritty of PBP data, which can be a bit time consuming. But I like where you’re going with this.

I think RB consistency has value, to be honest. a hypothetical back that averages 5.0 YPC, but 50% of runs are 10 yards and 50% are for 0 yards has much less value than a back that gets exactly 5 yards on every carry.

If there is a way to measure that, say the average of +/- 1 standard deviation carries

That sounds alot like what DVOA from FO does, they just use down and distance as well

That’s exactly what they do.

But I think it’d be pretty awesome to have a non-proprietary version of (D)VOA.

You could probably do it by using expected points curves from one of the EPA models to approximate the VOA tables at FootballOutsiders. As a result, a 1 yard run on 3rd and 1 has a lot more value than a 1 yard run on 3rd and 3.

Using Lesean McCoy’s average carries per game (19.5) and his career YPC (4.8), I get that a projection of 1303 yards for him next year. That projection sounds very reasonable as I think predicting anything over 1300 yards for any RB is hard because of so many things can go wrong during a season.

I think you are looking at this the wrong way. YPC isn’t sticky because it is very difficult to break off long runs. Sites like FO practically dismiss long runs because they aren’t predictable. However, just because long runs are rare doesn’t mean they should essentially be discarded (which is what is more or less happening when you dismiss YPC). One could argue that really the only thing that differentiates the great runners is that ability to create an “outlier” run. The median and mode carries are practically identical for all RBs – so what differentiates one from the other? If # carries = talent (which seems to be implied) and YPC doesn’t really mean anything are 4 of the top 5 rushing seasons held by the all-time greats: Larry Johnson, Jamal Anderson, James Wilder, and Eddie George? Toss in Gerald Riggs, Ricky Williams, and Barry Foster for 7 of the top 10. The only HOFer to crack the top 10 list is Eric Dickerson.

Here are the top 20 seasons all time (minimum 12 games played to get in those old timers) based on carries/game

Larry Johnson 2006

Jamal Anderson 1998

James Wilder 1984

Eric Dickerson 1986

Shaun Alexander 2006

Eddie George 2000

Jerome Bettis 1997

John Riggins 1983

Earl Campbell 1980

Gerald Riggs 1985

Ricky Williams 2000

Christian Okoye 1989

Terrell Davis 1997

Curtis Martin 1998

Emmitt Smith 1994

Terrell Davis 1998

Ricky Williams 2003

Eric Dickerson 1983

Barry Foster 1992

Rodney Hampton 1993

6 of those seasons are held by current HOFers.

Here is a top 20 YPG list (minimum 20 carries/game)

O.J. Simpson 1973

Jim Brown 1963

Walter Payton 1977

Eric Dickerson 1984

Adrian Peterson 2012

O.J. Simpson 1975

Jamal Lewis 2003

Earl Campbell 1980

Barry Sanders 1997

Jim Brown 1958

Terrell Davis 1998

Chris Johnson 2009

Clinton Portis 2003

Barry Sanders 1994

Ahman Green 2003

Shaun Alexander 2005

Terrell Davis 1997

Tiki Barber 2005

Ricky Williams 2002

Jamal Anderson 1998

9 are held by current HOFers ( +1 with I can only assume Peterson a lock)

Here is a top 20 YPC list (minimum 20 carries/game)

Jim Brown 1963

Barry Sanders 1997

O.J. Simpson 1973

Adrian Peterson 2012

Jim Brown 1958

Barry Sanders 1994

Chris Johnson 2009

Eric Dickerson 1984

O.J. Simpson 1975

Clinton Portis 2003

Walter Payton 1977

Jim Brown 1965

Jamal Lewis 2003

Ahman Green 2003

Emmitt Smith 1993

LaDainian Tomlinson 2006

Tiki Barber 2005

Larry Johnson 2005

Earl Campbell 1980

O.J. Simpson 1976

12 are held by HOFers (+2 for Peterson and Tomlinson)

If you were a team and randomly got assigned a RB from a particular group would you want your group from 1, 2 or 3? I know which one I would choose. Carries should be viewed more as a threshold to become part of a conversation – best season or best career with an X amount of carries as a baseline – not a measure of talent. Anything based on playcalling (like carries) can be gamed – as shown in that first list.

Interesting list. But this just might show that over the course of a career, YPC can be meaningful. And is it more appropriate to use yards/game or just yards (or yards/season)?

I think it also shows that just as a snapshot of a season it is pretty meaningful – at least in regards to the talent level of the RB.

If we have a RB this year that averages 20+ carries a game for 5.2+ YPC I think that there would be strong possibility that he is more talented and more likely to make the HOF than a RB that has more carries/game but a sub 5.0 YPC.

I would be curious to see a comparison of RBs that average maybe 17-18+ carries a game for a season and compare them to other RBs that average 10%+ more carries but 10%+ less YPC.

Maybe with the criteria of 15 games played as well. I am sure we would see some interesting (and likely odd) comparisons.

Suppose you know that a running back averaged 5.00 YPC in an eight-game subset. What would you project his YPC average for the other eight games? The answer isn’t anywhere near 5.00; in fact, it’s around 4.37. And a running back who averaged 3.5 YPC in one eight-game sample would be projected to averaged 3.92 YPC in the other eight games.I was wondering, isn’t this just a measure of regression towards the mean instead of a measuer of stickiness? Any subset with a higher then average YPC will leave the other set with a lower YPC, that’s just normal, not a measure of sticky or unsticky

Well, regression to the mean implies that there are things other than skill involved, so they’re related concepts. There is not much RTTM, for examples, with carries.

I’m new to your site, so i’m behind on understanding but what does the R stand for in the equations? it’s not part of the equation right? also, in an equation like this one 2.875 + 0.300 * Prior_YPC why are the numbers 2.875 and .300 chosen?

Thanks for stopping by, Bants. The R^2 explains how “good” a regression equation is at fitting the data. You can read a bit about it here — http://en.wikipedia.org/wiki/Coefficient_of_determination — but an R^2 of 0.09 means there’s very little relationship, while an R^2 of 0.64 is quite strong.

The neat thing about regression analysis is that it analyzes the past data and comes up with variables to best explain the data. So 2.875 and 0.300 aren’t chosen; the regression formula tells us that if we’re looking for a linear relationship to describe how the data has appeared, those are the best numbers to use.

New to the site as well. Awesome analysis, but I was hoping you can clarify some things for me. What’s the equation for stickiness here? Also, what is the sample for your regression?