The Simple Rating System is a many-splendored thing, but a known bug of the process is that huge outlier scoring margins can have undue influence on the rankings. Take the 2009 NFL season, for instance, during which the Patriots led the NFL in SRS in no small part because they annihilated the Titans 59-0 in a snowy October game that tied for the second-most lopsided margin of victory in NFL history. Outside of that single game, the Patriots’ PPG margin was +5.2, which wouldn’t have even ranked among the league’s top ten teams, but the SRS (particularly because it minimizes squared prediction errors between actual outcomes and those expected from team ratings) gave the 59-0 win a lot of weight, enough to propel New England to the #1 ranking. (A placement that looked downright laughable, I might add, when the Pats were crushed at home by Baltimore on Wild Card Weekend.)
One solution that is commonly proposed for this problem is to cap the margin of victory in a given game at a certain fixed number. This is especially popular in college football (in fact, Chase sort of uses a cap in his college SRS variant) because nonconference schedules will often see matchups between teams of incredibly disparate talent levels, games in which the powerhouse team can essentially choose the margin by which they want to steamroll their opponent. Within that context, it doesn’t really matter whether Florida State beats Idaho by 46 or by 66, because there’s a 0% chance Idaho is a better team than FSU — no new information is conveyed when they pile more and more points onto the game’s margin.
But what’s the right number to cap margin of victory at in the NFL? These are all professional teams, after all, so there’s plenty of evidence that in the NFL, blowing opponents out — even when they’re bad teams — says a lot about how good you are. Where do we draw the line, then, to find the point at which a team has clearly proven they’re better than the opponent, beyond which any extra MOV stops giving us information?
This is far from the definitive answer, but here goes… I looked at real-life SRS ratings for every team since the NFL expanded to 32 teams in 2002. The average was (obviously) 0.0, with a standard deviation of 6.4. I also looked at game predictions using SRS over that span; the error in the scoring margin of a given game between what actually happened and what SRS would have predicted had a standard deviation of 12.1, centered around an average of 0.0.
Knowing those key pieces of information, I generated fake, random ratings (a la Doug’s 10,000 Seasons post) for every 2013 team from a normal distribution with a mean of 0.0 and a standard deviation of 6.4. I also generated fake scoring margins for every game on the 2013 schedule, using the fake ratings to set the expected margin and randomly varying those around the expected mean using my game-to-game standard deviation of 12.1. (Oh, and home field was 2.5 PPG in this exercise.)
Finally, I looked at the relationship between those fake MOVs and who the “true” better team was in each matchup (since, being omniscient, I knew which team had the higher fake rating in any given game). It turns out you can estimate the probability that Team A is truly better than Team B from their scoring margin in a given game using the following formula:
p(better) ~ 1/(1+EXP(-0.0859135*adj_MOV))
where adj_MOV is your point margin in the game plus 2.5 if on the road, or minus 2.5 if at home.
So if I win by 5 at home, there’s only a 55.3% chance I’m truly the better team. If I win by 21 on the road, there’s an 88.3% chance I’m the better team. Et cetera.
How does this help us determine our proper MOV cap? Well, plugging numbers into the formula above, what would a team have to win by on a neutral field for us to be 95% certain they were the better team? The answer is 34.27; in the simulated games I ran, when a team won by more than 34 points of adjusted MOV, there was a 95% chance they also had the higher “true” rating… meaning they were almost certainly the better team in a moral, true-talent sense. Beyond that threshold, extra points of scoring margin really didn’t tell us anything we didn’t already know, which is exactly what we’re looking for in a MOV cap for real-life games.
Plugging 34.27 as the adjusted-MOV threshold into the SRS for 2009, we see the Patriots fall to #2 behind the eventual Super Bowl champion Saints, because the impact of the 59-0 destruction of Tennessee is reduced. Doing the same for 2013 (going into MNF), here are the standings:
We can also combine this with Wayne Winston’s method of weighting games for recency to get perhaps the best set of SRS ratings to judge how good a team is right now: