≡ Menu

The Simple Rating System is a many-splendored thing, but a known bug of the process is that huge outlier scoring margins can have undue influence on the rankings. Take the 2009 NFL season, for instance, during which the Patriots led the NFL in SRS in no small part because they annihilated the Titans 59-0 in a snowy October game that tied for the second-most lopsided margin of victory in NFL history. Outside of that single game, the Patriots’ PPG margin was +5.2, which wouldn’t have even ranked among the league’s top ten teams, but the SRS (particularly because it minimizes squared prediction errors between actual outcomes and those expected from team ratings) gave the 59-0 win a lot of weight, enough to propel New England to the #1 ranking. (A placement that looked downright laughable, I might add, when the Pats were crushed at home by Baltimore on Wild Card Weekend.)

One solution that is commonly proposed for this problem is to cap the margin of victory in a given game at a certain fixed number. This is especially popular in college football (in fact, Chase sort of uses a cap in his college SRS variant) because nonconference schedules will often see matchups between teams of incredibly disparate talent levels, games in which the powerhouse team can essentially choose the margin by which they want to steamroll their opponent. Within that context, it doesn’t really matter whether Florida State beats Idaho by 46 or by 66, because there’s a 0% chance Idaho is a better team than FSU — no new information is conveyed when they pile more and more points onto the game’s margin.

But what’s the right number to cap margin of victory at in the NFL? These are all professional teams, after all, so there’s plenty of evidence that in the NFL, blowing opponents out — even when they’re bad teams — says a lot about how good you are. Where do we draw the line, then, to find the point at which a team has clearly proven they’re better than the opponent, beyond which any extra MOV stops giving us information?

This is far from the definitive answer, but here goes… I looked at real-life SRS ratings for every team since the NFL expanded to 32 teams in 2002. The average was (obviously) 0.0, with a standard deviation of 6.4. I also looked at game predictions using SRS over that span; the error in the scoring margin of a given game between what actually happened and what SRS would have predicted had a standard deviation of 12.1, centered around an average of 0.0.

Knowing those key pieces of information, I generated fake, random ratings (a la Doug’s 10,000 Seasons post) for every 2013 team from a normal distribution with a mean of 0.0 and a standard deviation of 6.4. I also generated fake scoring margins for every game on the 2013 schedule, using the fake ratings to set the expected margin and randomly varying those around the expected mean using my game-to-game standard deviation of 12.1. (Oh, and home field was 2.5 PPG in this exercise.)

Finally, I looked at the relationship between those fake MOVs and who the “true” better team was in each matchup (since, being omniscient, I knew which team had the higher fake rating in any given game). It turns out you can estimate the probability that Team A is truly better than Team B from their scoring margin in a given game using the following formula:

p(better) ~ 1/(1+EXP(-0.0859135*adj_MOV))

where adj_MOV is your point margin in the game plus 2.5 if on the road, or minus 2.5 if at home.

So if I win by 5 at home, there’s only a 55.3% chance I’m truly the better team. If I win by 21 on the road, there’s an 88.3% chance I’m the better team. Et cetera.

How does this help us determine our proper MOV cap? Well, plugging numbers into the formula above, what would a team have to win by on a neutral field for us to be 95% certain they were the better team? The answer is 34.27; in the simulated games I ran, when a team won by more than 34 points of adjusted MOV, there was a 95% chance they also had the higher “true” rating… meaning they were almost certainly the better team in a moral, true-talent sense. Beyond that threshold, extra points of scoring margin really didn’t tell us anything we didn’t already know, which is exactly what we’re looking for in a MOV cap for real-life games.

Plugging 34.27 as the adjusted-MOV threshold into the SRS for 2009, we see the Patriots fall to #2 behind the eventual Super Bowl champion Saints, because the impact of the 59-0 destruction of Tennessee is reduced. Doing the same for 2013 (going into MNF), here are the standings:

sfoSFOSan Francisco49ersNFCWest1410.4
norNORNew OrleansSaintsNFCSouth158.1
kanKANKansas CityChiefsAFCWest156.1
nweNWENew EnglandPatriotsAFCEast155.5
ramSTLSt. LouisRamsNFCWest152.6
sdgSDGSan DiegoChargersAFCWest152.4
tamTAMTampa BayBuccaneersNFCSouth15-2.0
gnbGNBGreen BayPackersNFCNorth15-3.5
nygNYGNew YorkGiantsNFCEast15-5.8
nyjNYJNew YorkJetsAFCEast15-7.2

We can also combine this with Wayne Winston’s method of weighting games for recency to get perhaps the best set of SRS ratings to judge how good a team is right now:

sfoSFOSan Francisco49ersNFCWest1411.8
norNORNew OrleansSaintsNFCSouth157.8
nweNWENew EnglandPatriotsAFCEast156.0
kanKANKansas CityChiefsAFCWest155.5
ramSTLSt. LouisRamsNFCWest154.1
sdgSDGSan DiegoChargersAFCWest153.2
tamTAMTampa BayBuccaneersNFCSouth15-1.3
gnbGNBGreen BayPackersNFCNorth15-5.3
nygNYGNew YorkGiantsNFCEast15-5.7
nyjNYJNew YorkJetsAFCEast15-7.3
  • Dave

    Interesting but I can’t imagine this changing things hardly at all in most seasons. Just looking at the last ten seasons so there are only a 5-10 games games per season that go over the 34 pt threshold and a number are just by a point or two. And we are still only 95% sure.
    So it sort of seems like uneeded work to get very similar results.

    Especially when there is a lot of luck from things like fumbles already boiled into the SRS rating.

    • James

      Glancing between “normal” SRS and capped SRS it seems like most teams’ ratings are changed by less than 0.2 points, and the biggest change I found was 0.4 points.

      That said, it’s still handy to know what the cutoff point is since I expected it to be closer to 21 points.

  • Kostya

    I largely agree with Dave here. I think this might be too conservative an adjustment. I know you picked 95% because that’s two standard deviations, but ultimately that’s a somewhat arbitrary choice, yes? I’m curious how hard it would be run for one standard deviation? Or at various other points in between.

  • George

    Just a quick question, are those weighted numbers SRS based or Winston based? I know the difference isn’t/shouldn’t be huge and I did my numbers after the Monday night game (which shouldn’t make a large difference) but I have Arizona in first (on weighted numbers at 95%) at 15.03, Seattle in second at 14.08, and the 49ers in third at 11.70 which is substantially different to what you have above. Think I might be missing a game out of my dataset possibly (or it is giving Arizona way to much credit for that last game). By the way Happy Christmas to everyone for tomorrow.

  • Nate

    Running a simulation like that is a sensible way to see the implications of the assumptions we make.

    It seems like the 95% confidence cut-off is quite arbitrary, and it seems quite opaque what kind of impact – if any – the 95% cut off figure really has on the RSR accuracy. If we can run 10,000 simulations, why not run RSR and RSR with cut-offs in parallel and see which one is more accurate on average using some kind of accuracy metric like least squares or minimum absolute error? If the goal is to make in improved RSR (subject to the assumptions of the model) then setting the cut-off that way seeems more appropriate.

    In principle, we should be clipping the MOV in places where we are confident not only that one team is strong than the other, but also were we are confident it over-represents the difference between the teams, and, even then, it’s risky because – for a particularly weak or strong team – it will selectively remove more of the positive or negative noise for that team.

    • George

      I totally agree with this – that in an ideal world you would need to run it parallel with another system to find what really is a good model (or does the change generate less false positives, e.g. if you are establishing a threshold for a model – which I’m taking a guess would be somewhere in the 60% range, does adding a cut-off lower that threshold?)

      There is an interesting article over on Trey Causey’s site (the Spread) about this kind of thing currently as they are trying to come up with a win probability model (and are testing it against other models using things like receiver operating curves – I didn’t know about them before this morning and am currently trying to figure if you can do them in Excel). I had a similar issue when trying to add in specific home advantages (as I have a personal belief – that I can’t quite back up mathematically that Seattle and Baltimore have significant home field advantages, and Arizona and Green Bay less so, and Miami actually has a significant home field disadvantage). When I add these in to my least squares model that I run it actually marginally reduces the error (by a small percentage), but I am not sure that it actually makes a better model.

  • Danish

    Is SRS linear in the sense that a touchdown up 21 helps your SRS as much as a touchdown up 3? In that case, maybe it isn’t as much about a cutoff as much as it’s about discounting points after a certain point. I.e. there’s some information in the difference between a MOV of 21 and 28, but maybe not as much as in the difference between MOVs of 3 and 10.

  • Ty

    It would be interesting to know what the MOV cutoff would be in the NBA, MLB, and NHL, and even college. I am only guessing, but I would assume the cutoff for MLB would be 10 runs, and the NHL would be 5-6 goals. The NBA, I’m even less sure about, but I would guess it is somewhere close to the NFL, or a bit lower than that.