Imagine you are at a new marketing job. You have a set of data in Excel in front of you about sales numbers, and a scatter plot of those data points in a graphing calculator on your desk. Your boss comes by and asks you to give a regression analysis of the data by noon — he needs to know the trend line of the sales. You rack your brain for how to find the line of best fit, remembering that it involves something with finding a straight line on a scatter plot. What do you do?
The least squares regression is a simple linear regression analysis that is used to find the slope of the line that best fits or represents a set of data points.
A linear equation represents the linear relationship between the x-values and y-values of the points on a graph or chart.
3 Steps to Find the Equation for the Line of Best Fit
Real-world data sets don’t have perfect or exact lines. Your job is to find an equation of a line that can represent or approximate the data. This is called the line of best fit or the regression line.
You could eyeball the graph, draw a line, and pick some random numbers. Or, you could use the least squares regression to methodically figure out the line of best fit. Here's how.
Step 1: Find the Slope
To find the slope of our line of best fit, assemble your data into each column of a chart like the one below. Here’s what each column represents:
- x: the x-coordinate of a point
- y: the y-coordinate of a point
- : the average value for all the x-coordinates
- : the mean or average value for all the y-coordinates
We can find and by adding the values and dividing by 5 to find the mean or average:
We figure out the column by subtracting the average x and y-values from each coordinate. For example, the first x-coordinate is 1. is 3, so = -2.
We figure out the column by doing the same for the y-values. For example, the first y-coordinate is 1. is 6, so = -5.
To find the slope, we add up all the values in the last column, and then divide by all the values in the fourth column, .
In other words, the slope is equal to 2.2.
Step 2: Calculate the Y-Intercept
Now, you may remember that linear equations are in the format y = ax + b, where a stands for the slope, and b stands for the y-intercept.
Here, we use the same equation, but where y and x are the average values that we calculated earlier.
Since we know the slope = 2.2, we can plug in the value and solve.
Step 3: Put It Together
Now we have both parts and can put it all together! Our line of best fit is just like any other linear equation. We have both the slope and the y-intercept.
Our line of best fit is:
How to Find the Line of Best Fit
The least squares regression is one common way to find the equation of the line of best fit for any set of data you might come across in the real world.
- Step 1 is to calculate the average x-value and average y-values. From there, you do some computations to find the slope of the line of best fit.
- Step 2 is to use that slope to find the y-intercept.
- Step 3 is to put it all together.
Whether you are in class or at a job, now you can say with confidence how to find the line of best fit for any set of data!