This is not the article I planned to write. Before I get into it, then, let me tell you about my original plan, which at the time seemed simple and straightforward. I had read a few articles about NFL strength of schedule (SOS) — such as here and here — and noticed they follow a two-part template. First, they rank all teams according to the hardest and easiest projected schedules. Second, they discuss how these rankings might affect expectations for the highest-profile players at the extremes. Seems reasonable enough.
The typical approach to SOS only tells half of the story, however. It tells us who faced the easiest and hardest schedules at the extremes, which is valuable and actionable information. But it doesn’t tell us which teams face the biggest changes in SOS.
For example, consider a team that faced the sixth hardest rushing defense SOS last year and now gets the sixth easiest this year. Despite a huge 20-spot change, this team is likely to remain off the radar of this year’s SOS articles. Now compare this to a team that faced the sixth hardest slate of rush defenses last year and the fourth hardest this year. Despite a small two-spot move, this team will be all the rage in this year’s SOS articles.
As fantasy managers, we should care much more about the former team. The latter team is basically in the same situation. Almost nothing changed for them. In contrast, all things equal, the former team should be able to run more effectively this year.
This was the original idea for my article. I like to be thorough, however, so I performed some sanity checks before running analyses on my newly built dataset. The first thing I checked is how predictive NFL strength of schedule projections are in the first place. And this is where trouble arose for my seemingly simple and straightforward idea.
The Data: Warren Sharp’s Average Opponent Defensive Efficiency
For my data I chose Warren Sharp’s Pass and Rush Average Opponent Defensive Efficiency 2018 forecasts from his 2018 Football Preview. Sharp makes some of these statistics available on his website here. I deemed this the perfect place to start because Warren Sharp is a trusted football mind who has shaped much of my thinking on the sport. What follows is not intended as a critique of Sharp’s work. I would be excited to make even a tiny contribution toward improving upon his project.
Beginning with basics, I asked a simple question: what is the relationship between Sharp’s 2018 strength of schedule forecasts (published last year) and his actual 2018 SOS results (published a year later)? In other words, how well did Sharp’s forecasts fare according to his own measures? The results shocked me.
The most obvious way to test this relationship is via correlations. Seemingly simple, correlations can actually get pretty complicated. For our purposes, though, the gist is that correlations range from -1 to 1. 1 represents a perfect correlation, which entails that every projection was exactly right. A perfect negative correlation, -1, means every projection was exactly wrong (e.g., the projected hardest slate turned out to be the easiest, the second hardest the second easiest, and so on all the way down). Lastly, a correlation of 0 represents no relationship.
Thus, correlations closest to zero are weakest, correlations closest to 1 are strongest, and anything negative is really bad. The correlation for projected and actual 2018 passing defensive efficiency was -.37. This is kind of strong and, most importantly and surprisingly, it is negative. This means that most of the actual results were the opposite of what they were projected to be. In other words, projected strength predicted weakness more often than strength.
This makes no sense. So I wondered if the problem is with the Pearson’s correlation coefficient, which isn’t really designed for rankings. Maybe I should look at the Kendall or Spearman’s correlation coefficients instead. But these were -.23 and -.35, respectively, which is basically the same result.
Passing Strength of Schedule
Numbers can be misleading or confusing. This is especially true for statistics like correlations, which are computed from equations that rest on assumptions. Rather than get into this, let’s see what the actual data looks like. Here is what Sharp’s actual and projected rankings for passing defensive efficiency look like in graph form:
Figure 1 shows the average passing defensive efficiency for the entire season for each team. Hence the 32 dots – one for each team. The figure plots actual passing defensive efficiency along the x-axis and projected passing defensive efficiency along the y-axis.
There are two patterns to note in Figure 1. First, the dots are distributed somewhat randomly. A low correlation closer to zero than to one depicts this. Second, to the extent that the dots do take any discernible shape, they follow a downward slope (a negative correlation).
The upper left and lower right quadrants are more populated than the lower left and upper right quadrants. These data, then, tell the exact story that the correlations suggest. This is important because sometimes correlations mask meaningful patterns in the data. One possibility, for example, is that something strange — e.g., injuries, bad luck, etc. — happened to the defenses in one or more of the divisions. So I added divisional markers to the graph, as you can see here:
Warren Sharp’s projected and actual 2018 defensive passing efficiency are the same in Figures 1 and 2. The difference is that Figure 2 color codes the teams according to division. Unfortunately, though, the results remain mostly random and uninformative. One could argue that the divisions cluster together. That’s a stretch, however, and it’s not clear how fantasy managers could use such information anyway.
Rushing Strength of Schedule
Trying to remain optimistic, perhaps something about passing defensive efficiency makes it uniquely hard to predict. Maybe we should instead look at rushing defensive efficiency. Here is what that looks like:
The results in Figure 3 are once again null. If there is a difference it is that the negative correlation is weaker (closer to zero), but it remains negative. Does divisional membership matter for rushing defensive efficiency? You probably know the answer, but in the spirit of being thorough you can look here:
Figure 4 provides more of the same – random noise.
Conclusion: 3 Rays of Hope for NFL Strength of Schedule Truthers
The tale of these four figures is fantasy managers should not consult projections of defensive and offensive-defensive efficiency. Strength of schedule looks like a sham.
This conclusion might be extreme. I can think of three alternative possibilities. First, perhaps NFL strength of schedule projections are more accurate at the beginning of the season. It might be that after the first few weeks, injuries and scheme adjustments dominate defensive performance. (Hat tip to my colleague Nate Henry for making this suggestion before I had even collected and analyzed the data.)
Second, maybe I’m taking the wrong approach to Warren Sharp’s methodology for estimating or measuring defensive strength of schedule. After all, he provides four passing defensive efficiency metrics and three rushing ones, and I only looked at one of each. Sharp also provides 1SKR passing and rushing success rates of opponents. These metrics attempt to remove noise from garbage production by limiting the data to quarters 1-3 when the score is within one possession.
Third, perhaps other approaches to strength of schedule projections are more effective. For example, I am curious about how well the Pro Football Focus projections would fare under similar scrutiny. Their data and methodology are not as transparent and accessible as Sharp’s, however, which I think he deserves credit for.
Either way, make sure you don’t put too much stock in NFL strength of schedule projections without testing them first. Based on these data, SOS projections aren’t even reliable enough to serve as tie-breakers for close decisions.