Friday, March 25, 2022

Ruminating on 'Fitting the line to data'

 In linear regression, we need to find the best fit line over a set of points (data). StatQuest has an excellent video explaining how we fit a line to the data using the principle of 'least squares' - https://www.youtube.com/watch?v=PaFPbb66DxQ

The best fit line is the one where the sum of the squares of the distances from the actual points to the line is the minimum. Hence this becomes an optimization problem in Maths that can be calculated. We square the distances to take care of negative values/diffs. 

In stats, the optimization function that minimizes the sum of the squared residuals is also called as a 'loss function'. 

The equation of any line can be stated as: y = ax + b

where a is the slope of the line and b is the y-intercept. Using derivates we can find the most optimal values of 'a' and 'b' for a given dataset. 

No comments:

Post a Comment