Saturday, January 6, 2018

Published January 06, 2018 by with 0 comment

What Value Minimizes The Sum Of Squared Errors For A Group Of Numbers?

I noticed this result at work one day and it caught me off guard...
Start with the definition of the sum of squared errors:

To find the value of b that minimizes this, we take the derivative and set the result to 0:

Noting that b has no dependence on i, the sum on the right reduces to n*b and we have:

That last equation is the mean of y. I thought this was kind of cool even though it's really obvious when doing the math.

I noted at the beginning that I noticed this at work. Specifically, it was doing a linear fit where the slope ended up being very close zero. If we take the derivation of that (copied from wikipedia):

{\displaystyle {\begin{aligned}&{\frac {\partial }{\partial {\hat {\alpha }}}}\left(\operatorname {SSE} \left({\hat {\alpha }},{\hat {\beta }}\right)\right)=-2\sum _{i=1}^{n}\left(y_{i}-{\hat {\alpha }}-{\hat {\beta }}x_{i}\right)=0\\\Rightarrow {}&\sum _{i=1}^{n}\left(y_{i}-{\hat {\alpha }}-{\hat {\beta }}x_{i}\right)=0\\\Rightarrow {}&\sum _{i=1}^{n}y_{i}=\sum _{i=1}^{n}{\hat {\alpha }}+{\hat {\beta }}\sum _{i=1}^{n}x_{i}\\\Rightarrow {}&\sum _{i=1}^{n}y_{i}=n{\hat {\alpha }}+{\hat {\beta }}\sum _{i=1}^{n}x_{i}\\\Rightarrow {}&{\frac {1}{n}}\sum _{i=1}^{n}y_{i}={\hat {\alpha }}+{\frac {1}{n}}{\hat {\beta }}\sum _{i=1}^{n}x_{i}\\\Rightarrow {}&{\bar {y}}={\hat {\alpha }}+{\hat {\beta }}{\bar {x}}\end{aligned}}}

and the slope ends up being zero, you get the same equations I walked through above. A simple plot showing it with numbers:

The best fit runs straight through the mean. Neat.



Post a Comment