For three straight years, NFL Network has produced a list of the Top 100 Players of 20xx. Many people have criticized the results, and this summary from Bill Barnwell hits on some of the main issues. But my issue isn’t with the mistakes the players may be making in the voting booth, but the mistakes made in tabulating the votes. I want to suggest to the fine folks at NFL Network an alternative method for deriving a list of the top 100 players. This method has three big advantages over the current process:
(1) It will take players only a few minutes — or as long as they like — to participate.
(2) More players will be part of the judging, since the time commitment will be lessened.
(3) The results will be more accurate.
Instead of asking players to write down a bunch of names from memory, my suggested method would involve asking them a bunch of simple and straightforward questions. Imagine a player sitting in front of a computer screen, and asked to pick an answer to each of the following:

“Who should be ranked higher: Adrian Peterson or Max Unger?” [Clicks Peterson.]
“Who should be ranked higher: Tyvon Branch or Ben Roethlisberger?” [Clicks Roethlisberger.]
“Who should be ranked higher: Reggie Bush or Andy Dalton?” [Thinks for a second... picks Bush.]
“Who should be ranked higher: Patrick Willis or Joe Flacco?” [Thinks.... picks Willis.]
“Who should be ranked higher: Jimmy Graham or Jacoby Jones?” [Clicks Graham.]
That’s a lot better than the current system, described below by Mike Florio of Pro Football Talk:
“All players are given the opportunity to vote through ballots we send to all 32 teams around Thanksgiving,” NFL Network spokesman Alex Riethmiller told PFT via email. “For convenience sake, we try to time it with Pro Bowl balloting, so they can do them together. In addition to ballots collected that way, we also give ballots to many of the players that we interview for our shows. This year, in total, we received 481 votes.”
To vote, each player lists only his top 20 players in the league. The player listed at No. 1 gets 20 points, the player listed at No. 2 gets 19 points, and the process continues until the player listed at No. 20 gets one point.
So it’s really not a “top 100″ list. It’s the 100 players who received the highest vote totals from players who attempted to list their personal top 20, presumably without the benefit of all 32 rosters or starting lineups or Pro Bowl qualifiers or anything else that would ensure they aren’t accidentally overlooking someone as they pull 20 names out of thin air.
The NFL Network is in a bit of bind here. You can’t expect players to sit down and rank 100 players — that’s way too time consuming. You also subject yourself to criticism if you give give players a group of players and ask them to pick the best among that group. At that point, you’re introducing subjectivity into the equation if you whittle the list down to 200 or so players.
Understanding the NFL Network’s two significant constraints — having to rank the best 100 out of 1600+ players and having voters (i.e., players) who don’t want to spend more than a few minutes — I can understand why the network settled on the process they chose. Asking players to pick from memory their top 20 players and then have the Network total things up on the back end solves both of those problems.
But there is a way to serve both masters and and produce a more accurate list. And, it can get us to go much deeper than 100 players, too. It’s called an Elo Rater, named after Arpad Elo. The Elo Rater is a binary system that only looks at wins and losses and, implicitly, strength of schedule. It was originally used to rate chess players, because how else would you rate chess players besides wins and losses and quality of opponents?
Here’s what Wikipedia has to say about the Elo rating system:
A player’s Elo rating is represented by a number, which increases or decreases based upon the outcome of games between rated players. After every game, the winning player takes points from the losing one. The total number of points gained or lost after a game is determined by the difference between the ratings of the winner and loser. In a game between a highrated player and a lowrated player, the highrated player is expected to score more points. If the highrated player wins, only a few rating points will be taken from the lowrated player. However, if the lower rated player scores an upset win, many rating points will be transferred…. This makes the rating system selfcorrecting. A player whose rating is too low should, in the long run, do better than the rating system predicts, and thus gain rating points until the rating reflects the true playing strength.
So here’s my suggestion, NFL Network. Use an Elo rating system to come up with the Top 100 list. Instead of having two people face off in a chess match, we use the Elo Rating by simply ask an NFL player one question: Who is better, Player X or Player Y?
BaseballReference has used an Elo Rater, with the general public as judges, to come up with a list of the best players in MLB history. You can read the fine print here, but here is the CliffsNotes version for the NFL Network.
Every active player in the NFL is given an initial rating of 1500 points. These ratings are then updated by randomly selecting pairs of players and having them “play” each other. The judges, of course, are active NFL players. You can allow each player to answer as many questions as he likes: each question can be answered in just a few seconds.
Let’s say the process begins by asking Richard Sherman who is better: Calvin Johnson or Dwight Freeney. Since this is the first of many thousands of ratings, both Johnson and Freeney have ratings of 1500. We begin by calculating the probability of each player winning, according to the following equation:
Probability of Calvin Johnson winning = 1 / (1 + 10^((1500 – 1500) / 400))
Probability of Dwight Freeney winning = 1 / (1 + 10^((1500 – 1500) / 400))
Fortunately, no one needs to do any math here: a computer can do all the hard work instantly. Inside the parentheses, you see we part of the formula reads ’1500 – 1500′ — that’s because those are the ratings for the two players. If you do the math, you’ll see that the probability of Johnson winning, just like the probability of Freeney winning, is 50%. That’s because each player has the same rating.
Note that there are no preexisting biases here: the players will be 100% responsible for wherever we land in the ratings. Here is how: After the winner has been determined, the ratings of the two players are adjusted. If Sherman chooses Johnson, then the new ratings become:
Calvin Johnson’s new rating = 1500 (his old rating) + 20 * (.50)
Dwight Freeney’s new rating = 1500 (his old rating) – 20 * (.50)
This means Megatron will now be at 1510, while Freeney will drop to 1490.
Now, let’s say we’ve had several thousand “matchups” judged by NFL players. At that point, let’s say Peyton Manning has a rating of 2000, and Arian Foster has a rating of 1900. Now, we have Aldon Smith sitting in our player rater chair, and he sees this come across the screen: “Peyton Manning or Arian Foster?”
Again, let’s start with the win probability.
Probability Peyton Manning wins = 1 / (1 + 10^((1900 – 2000) / 400)) = 0.640
Probability Arian Foster wins = 1 / (1 + 10^((2000 – 1900) / 400)) = 0.360
If Aldo Smith picks Manning, then the new ratings are:
Peyton Manning’s new rating = 2000 + 20 * 0.360 = 2007^{1}
Arian Foster’s new rating = 1900 – 20 * 0.360 = 1893
If Smith instead picks Foster, then the new ratings become:
Peyton Manning’s new rating = 2000 – 20 * 0.640 = 1987
Arian Foster’s new rating = 1900 + 20 * 0.640 = 1913
The ratings are selfcorrecting: if a player’s rating gets too high, a bunch of losses will bring him back to where he belongs. Similarly, if a player’s rating is too low, “beating” a bunch of higherranked players will shoot him up the rankings.^{2}
So how should this be run? I think the NFL Network should start by creating a list of the top 400 players. They can do this however they like: by using an Elo Rater that is opened up to fan voting would be a good way to get another measure (and hits for NFL.com!) of player ratings. Because of the millions of votes, you could put all 1600+ players into the system, and get a reasonable top 400. Another option is NFL Network could simply choose its top 400 players. Once that’s done, you create a simple website, much like this one, and have NFL players log onto the website and “vote” for each matchup they see. I suspect that in 10 minutes, a player could rate at least 30 different matchups. Do that for 600 players, and you can probably get 20,000 ratings. At that point, the computer does the rest, and it will spit out a ranking of the top 100 — or 400 — players in the NFL.
A player could probably rate 5 players in a minute if he wanted. And that gives us the real advantage of the Elo system over what NFL Network is using. Intsead of having a player just rank his top 20 players, you will end up with over 20,000 ratings, and these are ratings that mean something. Now, the ratings won’t look silly because a bunch of players simply forgot a name: that won’t be an option, as the best 400 names will appear over the 20,000+ ratings. I suspect that with a user friendly system — i.e., players clicking on a computer instead of sitting down with a pen and pad — you might end up with 30,000+ ratings. Then the NFL Network could truly produce a list of the top 100 players.
 As you can tell, the number “20″ is the key variable here. That variable is known as the KFactor, and the correct number is subject to much discussion. I think 20 works for our purposes, but so would 10, or 24, or some other reasonable number. [↩]
 One bit of fine print. Pairs should not be chosen completely at random. The first player should be randomly selected, but his “opponent” should be someone relatively close to him (say, within 250 points). This will prevent bizarre choices from distorting the ratings. [↩]
{ 18 comments… read them below or add one }
I said the same thing weeks ago, an Elo ranking has weaknesses but it’s going to be more accurate and take as much time as the players want to give. Barnwell makes the biggest point though, players just don’t study other players they don’t directly compete against. An offensive guard isn’t going to watch film on opposing RB’s and a DE isn’t studying opposing safeties. So I do trust a QB’s rankings of opposing DB’s and LB’s but I don’t trust him ranking RB’s. I trust a FS’ ranking of QB’s and WR’s, I don’t trust his rankings of DE’s. I think there are plenty of other weaknesses to using players to create an overall list but I’ll get back on topic of the Elo ranker, sorry for the rant haha.
I know that PFR also uses the Elo rankings, I’ve contributed myself, and theirs is fantastic since it includes some data as well (found here: http://www.profootballreference.com/friv/elo.cgi). Great article, thanks!
Thanks Topher. My only point is that an Elo Rater is a big improvement on the current system.
Totally agree, like you said it’s faster, can be done in bursts, and would likely create a more balanced Top 100.
ELO has it’s problems too. The profootballreference one has Jeff Garcia and Donovan McNabb ahead of Peyton Manning??
Yeah, that’s pretty odd, but not an ELO issue as far as I can tell. I think that’s just a judges issue.
The thing with the PFR one is it has a low activity level, I can sit down for an hour and totally screw it up since not many people use it, despite the high traffic PFR gets. At least that’s the issue I’ve found while fiddling with the PFR ranker.
That’s because it’s a fan vote, and can be gamed by certain fanbases (or antifanbases) if they simply stuff the ballot box. I can only assume NFL players wouldn’t engage in the same abuse of the system, and even if you think they would, you could limit the number of votes any given player can cast, or put a limit on how many total votes can come from any one roster.
I do think the players would have biases but not in the same way fans do. Players would obviously be limited to not voting for their teammates (clear bias there) but they do have friends on opposing teams, especially since most players spend their careers on 23 teams and form friendships across the league. If I see a close vote I’m going to vote for the player I’m friends with over the one I don’t know or may not like. You may also not vote for divisional opponents that you don’t particularly like, a player like Cortland Finnegan probably wouldn’t have gotten a lot of votes from Colts, Texans and Jags WR’s.
Players likely would eliminate that team bias but they would likely introduce a whole new set of biases.
Reading the current voting method is interesting. (Each player only votes for 20 players.) So it gets me to wondering how Dennis Pitta makes the top100 but Jimmy Graham does not.
But I think I might know the answer. Just about everybody has Peyton Manning, Brady, Adrian Peterson, etc. on their list. And then some guys probably put their own name at #20.
So the voting totals probably wind up with the top 30 or 40 players getting tons of voting points, and by the time you get to the 90′s, you are dealing with guys who got just a few points. The difference between 1st and 20th on the list is probably equal to the distance between 20th and 1,000th.
Good points.
As a programming lesson, I created an elo rater for QB’s in 2013 a few weeks ago. I anybody is interested in placing a few votes, go for it: http://random.tooshay.us/qb/qbrate.php
Very cool. Maybe we could put an Elo Rater up here, if you want to contribute your programming skills!
I am sure you know that PFR has an elo rater. (Though I was surprised you linked to the baseball reference version.) Or are you talking about a currentseason rater?
An EloRater could be fun based on anything. For example, we could do one to predict team power rankings in the preseason.
I am very much an amateur, but I would be willing to give it a shot if you have something you’d like to put up.
What’s funny to me is this is the NFL and they are conducting a Top 100 players poll for their own purposes. You would think they would have an incentive to do it the right way.
Thus, you would think they would have done at least a modicum of research into the subject.
I don’t know, but saying, “Just give us your top 20 players off the top of your head and we’ll give them points based on where you rank them” seems like something I would have come up with in middle school.
How can such incompetence exist on such a simple task?
I wonder how ELO scores as a voting system. Specifically, I wonder how it would compare to Condorcet.
If we could just get our hands on the top20 ballots people sent in, we could do a test. Compare the overall rankings they came up with, with two other scorings:
1) ELO – Note that each judge would pick every player ahead of every other player ranked lower, as well as every other player not in their list that is in someone else’s list. Judges wouldn’t record any ELO votes between two players not on their top20 list. Process, then sort by ELO.
2) Condorcet – Same scoring; all players on a top20 list are preferred over all players not on their list, and each player is preferred over a lower ranked player on the same top20 list. Process. Any “ties” (Smith/Schwartz sets) would just be displayed as ties.
Just a quick question (sorry to bring up an older thread): you mention 20 as the KFactor. Where does the 400 come from? Is it from the initial list of 400 that the pollproviders come up with? Besides the obvious 1s and 10s, those are the only two magic numbers in the equations is why I’m asking.
{ 3 trackbacks }