Sometimes the best blog posts are ones that remind you of things you’ve forgot. Seven years ago, Doug wrote about Benford’s Law. Also known as the First Digit Law, it has been observed across many data sets, from street address to lengths of rivers to stock prices to the number of followers people have on twitter. A new Applied Economics Letters article states that “nonconformity with Benford’s law can be a useful indicator of poor data quality, which may be a result of fraud or manipulation.”

So what the heck is it? According to Wikipedia, this phenomenon

refers to the frequency distribution of digits in many (but not all) real-life sources of data. In this distribution, the number 1 occurs as the leading digit about 30% of the time, while larger numbers occur in that position less frequently: 9 as the first digit less than 5% of the time. Benford’s Law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.

For example, 131 players have caught a touchdown this year. As it turns out, the distribution pretty closely matches what Benford’s Law would predict:

Digit # Perc 1 39 29.8% 2 34 26.0% 3 29 22.1% 4 11 8.4% 5 6 4.6% 6 4 3.1% 7 5 3.8% 8 2 1.5% 9 1 0.8%

You might think that part of that is just an artifact of where we are in the year, and that may be true: a bunch of players have only one touchdown reception. Then again, Jimmy Graham is the only player with double digit touchdowns, and that’s likely to change, too. But as Doug noted, one of the neat things about Benford’s Law is that it (subject to some caveats) is unit agnostic. For example, what if we look at receiving touchdowns per *minute of game time*? Graham has played in eight games; if we assume 60 minutes for each game, that means Graham has scored 0.0208 receiving touchdowns per minute. That counts as a two (ignore the leading zeroes); if we do that for all 131 players, we get the following distribution:

Digit # Perc 1 32 24.4% 2 21 16.0% 3 12 9.2% 4 19 14.5% 5 9 6.9% 6 20 15.3% 7 7 5.3% 8 9 6.9% 9 2 1.5%

Far from perfect, but 1 is by far the most common leading digit, and 9 brings up the rear. What about other football statistics? Sixty two players have accumulated passing yards so far this year. Now, the variance is passing yards is pretty small, so that’s not going to following a Benford distribution. At least, not if we think of passing production in terms of yards. Sure, Peyton Manning leads with 2,919 yards, but that means he’s also thrown for 1.66 miles. If we measure all 62 players by how many passing *miles* they’ve accumulated, the leading digit is 1 40% of the time:

Digit # Perc 1 25 40.3% 2 6 9.7% 3 5 8.1% 4 2 3.2% 5 3 4.8% 6 6 9.7% 7 4 6.5% 8 4 6.5% 9 7 11.3%

What if we instead measure passing production in centimeters?

Digit # Perc 1 31 50.0% 2 15 24.2% 3 2 3.2% 4 5 8.1% 5 4 6.5% 6 1 1.6% 7 1 1.6% 8 0 0.0% 9 3 4.8%

Benford’s Law isn’t just focused on quarterbacks. Karlos Dansby leads the NFL in tackles. When it comes to tackles, 37.2% of all players with at least one tackle have recorded 1 — or 10 through 19 — tackles this year:

Digit # Perc 1 269 37.2% 2 163 22.5% 3 97 13.4% 4 58 8.0% 5 43 5.9% 6 27 3.7% 7 20 2.8% 8 24 3.3% 9 22 3.0%

It’s easy to come up with data sets that don’t follow Benford’s Law — the weight or ages of NFL players, or team winning percentages — but the remarkable part is that it does apply to so many data sets. For example, what if we look at points differential of all team seasons since 1970? I put that data into this tester, and take a look at the results:

Pretty neat, eh?

Recent Comments