Functions > Data Analysis > Curve Fitting > Example: Linear Regression
  
Example: Linear Regression
Use the polyfitc, line, slope, and intercept functions to find the least-squares line of best fit through a set of x-y data. Use the stderr function to calculate the error in fitted parameters. Calculate confidence limits around the line of best fit and form confidence intervals.
Line of Best Fit
Create a linear function to estimate how long it takes to drive various distances.
1. Define a set of distances in miles, and the time it takes in minutes, to drive these distances.
Click to copy this expression
Click to copy this expression
Click to copy this expression
2. Define a univariate linear regression equation.
Click to copy this expression
3. Call polyfitc to calculate the coefficients of regression a and b.
Click to copy this expression
Click to copy this expression
The coefficients are such that the difference between the values in T and the values calculated by the regression equation f is a minimum for each x value. You can check this by using a solve block and the minimize function to minimize the sum of squares:
Click to copy this expression
Click to copy this expression
Click to copy this expression
4. Define the line of best fit which minimizes the sum of the squares of the distances from each point to the line.
Click to copy this expression
You should use parametrized equation from linear or any other type of regression should be used only for values near the original observed data. The line of best fit for the above data predicts that it takes the following time to travel a distance of 0 miles:
Click to copy this expression
This does not make sense if the measured time is strictly travel time at constant velocity. This kind of result can sometimes represent a particular physical phenomenon. In this case, the time needed to drive zero mile may be interpreted as the average waiting time at traffic lights.
5. Plot the data points and the line of best fit.
Click to copy this expression
Alternative Methods to Calculate the Slope and Intercept
There are several methods to calculate the slope and intercept for the line of best fit. For example, the line function combines the slope and intercept functions. Other methods include matrix calculations or statistical relationships.
1. Call the intercept and slope functions.
Click to copy this expression
Click to copy this expression
2. Call the line function.
Click to copy this expression
3. Use matrix calculation by utilizing the augment function.
Click to copy this expression
Click to copy this expression
4. Use statistical relationships by utilizing the stdev, corr, mean and slope functions.
Click to copy this expression
Click to copy this expression
5. Use a plot to show that the least-squares line always passes through the (mean(X), mean(T)) point:
Click to copy this expression
Standard Errors
Calculate the standard error in the estimate (also called the standard error) to measure how good the above linear fit is. Also calculate the error in the slope and in the intercept.
1. Define the degrees of freedom (the number of data points minus the number of fitted parameters).
Click to copy this expression
2. Call the stderr function to calculate the standard error in the estimate for the line of best fit defined above.
Click to copy this expression
This is the square root of the mean squared error, MSE, or σ2:
Click to copy this expression
3. Compare the calculated standard error with the standard error returned by the polyfitstat function.
Click to copy this expression
Click to copy this expression
4. Calculate the standard errors in the slope and in the intercept.
Click to copy this expression
Click to copy this expression
5. Repeat the above calculation using matrix calculation.
Click to copy this expression
Click to copy this expression
Click to copy this expression
Click to copy this expression
6. Use the augment function to show that the standard errors for each regression coefficient are recorded in the matrix returned by the polyfitc function.
Click to copy this expression
Confidence Intervals for Each Coefficient
Use the above estimates, together with percentile points from the Student's t distribution, to form a confidence interval for the estimates of the slope and the intercept.
1. Define the significance level for a 98% confidence interval and use function qt to calculate the t-factor.
Click to copy this expression
Click to copy this expression
2. Calculate the confidence limits for the slope.
Click to copy this expression
Click to copy this expression
There is a 98% chance that the actual slope value falls between SL and SU.
3. Calculate the confidence limits for the intercept.
Click to copy this expression
Click to copy this expression
The wide range on this value reflects the high level of scatter in the data.
4. Call the confidence function to repeat steps 1 to 3.
Click to copy this expression
Click to copy this expression
The confidence function returns the confidence interval widths in its first column and the t-factor in its second column. When you divide the widths by the t-factor, you get back the standard errors in both parameters:
Click to copy this expression
5. To find the confidence limits, add or subtracts the width from the relevant parameter:
Click to copy this expression
6. Use the augment function to show that the standard errors for each regression coefficient are recorded in the matrix returned by the polyfitc function.
Click to copy this expression
Confidence Intervals for the Regression
1. Use functions length and mean to calculate a confidence interval for the regression itself.
Click to copy this expression
2. Use the above function to calculate the confidence interval for any predicted x value:
Click to copy this expression
3. Use matrix calculation:
Click to copy this expression
Click to copy this expression
4. Plot data, the line of best fit, and the confidence interval for the entire regression region.
Click to copy this expression
Click to copy this expression
Click to copy this expression
Click to copy this expression
The confidence region for predicted values has a waist near the center of the measured values. This is because the formulas used to calculate the regression are mean-based, so the values predicted closer to the mean of the data are more accurate.
5. Calculate the confidence limits on the measured values. These limits are slightly different from the limits for predicted values.
Click to copy this expression
Click to copy this expression
6. Use matrix calculation:
Click to copy this expression
Click to copy this expression
7. Plot the confidence limits as an error trace.
Click to copy this expression
Click to copy this expression
Click to copy this expression
* 
You can use the graphs as a form of outlier detection where measured values falling outside the confidence intervals indicate an outlier.