
Contents |
For a given dataset (Xi,Yi), i = 1,2, ..., n, where X is the independent variable and Y is the dependent variable, a polynomial regression fits data to a model of the following form:
|
|
(1) |
where k is the polynomial order. In Origin, k is a positive number that is less than 10. The error term
is assumed to be independent and normally distributed
.
To fit the model, assume that the residuals:
|
|
(2) |
are normally distributed with the mean equal to 0 and the variance equal to
. Then the maximum likelihood estimates for the parameters
can be obtained by minimizing the chi-square, which is defined as:
|
|
(3) |
If the error is treated as weight, the chi-square minimization equation can be written as:
|
|
(4) |
and
|
|
(5) |
where σi are the measurement errors. If they are unknown, they should all be set to 1
The fit related formulas are summary here:
The calculation of the estimated coefficients is a procedure of matrix calculation. First, we can rewrite the regression model in the matrix form:
|
|
(6) |
where
|
|
(7) |
The estimate of the vector B is the solution to the linear equations, and can be expressed as:
|
(8) |
where X' is the transpose of X.
For each parameter, the standard error can be obtained by:
|
|
(9) |
where
is the jth diagonal element of (X'X) - 1. the residual standard deviation
(also called "std dev", "standard error of estimate", or "root MSE") is computed as:
|
(10) |
| Note: Please read the ANOVA Tablefor more details about the degree of freedom, dfError. |
and the residual sum of square (RSS) or sum of square error (SSE), which is actually the sum of the squares of the vertical deviations from each data point to the fitted line. It can be computed as:
|
(11) |
If the regression assumptions hold, we can perform the t-tests for the regression coefficients with the null hypotheses and the alternative hypotheses:
The t-values can be computed as:
|
|
(12) |
With the t-value, we can decide whether or not to reject the corresponding null hypothesis. Usually, for a given confidence level α, we can reject H0 when | t | > tα / 2. Additionally, the p-value is less than α.
The probability that H0 in the t test above is true.
|
|
(13) |
where tcdf(t, df) compute the lower tail probability for the Student't t distribution with df degree of freedom.
From the t-value, we can calculate the (1 - α) * 100% confidence interval for each parameter by:
|
|
(14) |
The Confidence Interval Half Width is:
|
|
(15) |
where UCL and LCL is the Upper Confidence Interval and Lower Confidence Interval, respectively.
Some fit statistics formulas are summary here:
The Error degree of freedom. Please refer to the ANOVA Table for more details.
The residual sum of squares, see formula (11).
The goodness of fit can be evaluated by coefficient of determination, R2, which is given by:
|
|
(16) |
The adjusted R2 is used to adjust the R2 value for the degree of freedom. It can be computed as:
|
|
(17) |
Then we can compute the R-value, which is simply the square root of R2
|
|
(18) |
Root mean square of the error, which equals to:
|
|
(19) |
Equals to square root of RSS:
|
|
(20) |
The ANOVA table of linear fitting is:
| df | Sum of Squares | Mean Square | F Value | Prob > F | |
|---|---|---|---|---|---|
| Model | k | SSreg = TSS - RSS | MSreg = SSreg / k | MSreg / MSE | p-value |
| Error | n* - k | RSS | MSE = RSS / (n* - k) | ||
| Total | n* | TSS |
| Note: If intercept is included in the model, n*=n-1. Otherwise, n*=n and the total sum of squares is uncorrected. |
Where the total sum of square, TSS, is:
|
(21) |
The covariance matrix of the polynomial regression can be calculated as
|
|
(22) |
The correlation between any two parameters is:
|
(23) |
The confidence interval for the fitting function says how good your estimate of the value of the fitting function is at particular values of the independent variables. You can claim with 100α% confidence that the correct value for the fitting function lies within the confidence interval, where α is the desired level of confidence. This defined confidence interval for the fitting function is computed as:
|
|
(24) |
where
|
|
(25) |
and C is the Covariance Matrix.
The prediction interval for the desired confidence level ? is the interval within which 100α% of all the experimental points in a series of repeated measurements are expected to fall at particular values of the independent variables. This defined prediction interval for the fitting function is computed as?
|
|
(26) |