Sunday, January 24, 2021

Published January 24, 2021 by with 2 comments

Regression Toward the Mean in the NFL

I wanted to run some quick tests to see if regression toward the mean shows up clearly in NFL data.


Background

In case you aren't familiar, 'regression toward the mean' roughly means that if a random variable is an outlier, a future instance is likely to be closer to the mean. For a really simple model to make this easy to understand for something like NFL player performance, imagine that each player's performance is X% skill and Y% luck. If X is 100 and Y is 0, then previous years will nearly perfectly predict future years. If Y is 100 and X is 0, then there will be no relationship between performance from one year to the next. If X and Y are both between 0 and 100, there will be some relationship between performance from year to year but it won't be perfect.

There are two easy ways for me to look at this phenomenon:
  1. plot one year's performance against the previous year's along with a line with a slope of 1 (X = 100%) and a best-fitting line

  2. bin the data by previous year's performance and look at how each bin shifted in the next year
What might we see? There are many possibilities, but here are a few examples:
  • "Players that performed well perform even better the next season": plot 1 will show a slope greater than 1 and plot 2 will show the bottom bin doing worse and the top bin doing better

  • "Performance is driven by skill so it's the same year-to-year": plot 1 will show a slope of 1 and plot 2 will show all bins at roughly zero

  • "Performance is a mix of skill and luck so top performers will move back towards average and poor performers will move up towards average (this is the regression toward the mean case)": plot 1 will show a slope between 0 and 1, and plot 2 will show the bottom bin doing better and the top bin doing worse

  • "It's all random/luck": plot 1 will show a slope of ~0 and plot 2 will show all bins at roughly 0

  • "Poor performers overcompensate and end up better than average next season": plot 1 will show a slope less than 1 and plot 2 will show the bottom bin doing better and the top bin doing worse
To test it out I ran with 5 different stats using data from all starters from 2000-2020. For example, for a 2010-2011 compare, year 1 is 2010 and year 2 is 2011. You would expect the best performers in 2010 to do a bit worse in 2011, and the worst in 2010 to do a bit better in 2011. In the bar plots, the 'bottom third' means the 33% of players that were worst in season 1 from the plot above.

Results












and the data show regression toward the mean. Every stat I've tried (with a luck component obviously) followed the pattern above.


      edit

2 comments:

  1. เล่นได้จ่ายจริง ฝากถอนไม่มีขั้นต่ำด้วยระบบอัตโนมัติ
    1บาท 5บาท 10บาท ก็เล่นได้

    💸 สุ่มแจกรางวัลละ 5,000 - 10,000 บาท กว่า100รางวัล
    💖ไม่มีที่ไหนกล้า ให้ได้มากเท่านี้อีกแล้ว

    🏆มั่นคง-ปลอดภัย ด้วยมาตรฐานระดับสากล

    ReplyDelete
  2. play and pay for real Deposit with no minimum with automatic system
    1 baht 5 baht 10 baht can play

    💸 Randomly give away prizes of 5,000 - 10,000 baht, over 100 prizes
    💖 Nowhere dares to get as much as this again

    🏆 Stable - safe with international standards

    ReplyDelete