OriginLab Corporation - Scientific Graphing and Data Analysis Software - 2D graphs, 3D graphs, Contour Plots, Statistical Charts, Data Exploration, Statistics, Curve Fitting, Signal Processing, and Peak Analysis

Polynomial Regression Results

Contents

How Origin Fits a Polynomial Curve

The Fitting Model

For a given dataset (Xi,Yi), i = 1,2, ..., n, where X is the independent variable and Y is the dependent variable, a polynomial regression fits data to a model of the following form:

 y=\beta _0+\beta _1x+\beta_2x^2+\beta_3x^3+...+\beta_kx^k+\varepsilon

(1)

where k is the polynomial order. In Origin, k is a positive number that is less than 10. The error term \varepsilon is assumed to be independent and normally distributed (N(0, \sigma^2))\,\!.

To fit the model, assume that the residuals:

 res_i=y_i-(\beta _0+\beta _1x+\beta_2x^2+\beta_3x^3+...+\beta_kx^k)\,\!

(2)

are normally distributed with the mean equal to 0 and the variance equal to \sigma_i^2. Then the maximum likelihood estimates for the parameters \beta_i\,\! can be obtained by minimizing the chi-square, which is defined as:

\chi ^2=\sum_{i=1}^n \frac{(y_i-\hat y_i)^2} {\sigma _i^2}

(3)

Weighted Fitting

If the error is treated as weight, the chi-square minimization equation can be written as:

\chi ^2=\sum_{i=1}^n \frac {(y_i-\hat y_i)^2}{\sigma _i^2}

(4)

and

w_i=\frac{1}{\sigma _i^2}

(5)

where σi are the measurement errors. If they are unknown, they should all be set to 1

Parameters

The fit related formulas are summary here:

Image:Polynomial Regression Results1.png

The Fitted Values

The calculation of the estimated coefficients is a procedure of matrix calculation. First, we can rewrite the regression model in the matrix form:

Y=XB+E\,\!

(6)

where

Y=\begin{bmatrix}
         \frac{y_1}{\sigma _1}\\    
         \frac{y_2}{\sigma _2}\\
         \vdots \\ 
         \frac{y_n}{\sigma_n}\\
         \end{bmatrix}
X=\begin{bmatrix}
        \frac{1}{\sigma_1}&\frac{x_1^2}{\sigma_1} &\frac{x_1^3}{\sigma_1} & \cdots & \frac{x_1^k}{\sigma_1}\\
        \frac{1}{\sigma_2}&\frac{x_2^2}{\sigma_2}& \frac{x_2^3}{\sigma_2}&  \cdots &\frac{x_2^k}{\sigma_2}\\
        \vdots &           \vdots   &              \vdots    &              \ddots &\vdots \\ 
        \frac{1}{\sigma_n}&\frac{x_n^2}{\sigma_n}& \frac{x_n^3}{\sigma_n}  &\cdots &\frac{x_n^k}{\sigma_n}\\
        \end{bmatrix}
B=\begin{bmatrix}
         \beta_0\\    
         \beta_1\\
         \vdots \\ 
         \beta_k\\
         \end{bmatrix}
E=\begin{bmatrix}
         \varepsilon_1\\    
         \varepsilon_2\\
         \vdots \\ 
         \varepsilon_n\\
         \end{bmatrix}

(7)

The estimate of the vector B is the solution to the linear equations, and can be expressed as:

\hat{B}=\begin{bmatrix}
               \hat{\beta_0}\\
               \hat{\beta_1}\\
               \vdots \\
               \hat{\beta_k}\\
               \end{bmatrix} = \left ( x'x \right )^{-1}x'y\,\!

(8)

where X' is the transpose of X.

The Parameter Standard Errors

For each parameter, the standard error can be obtained by:

s_{\hat{\beta_j}}=s_{\varepsilon}\sqrt{c_{jj}}

(9)

where c_{jj}\,\! is the jth diagonal element of (X'X) - 1. the residual standard deviation s_{\varepsilon}(also called "std dev", "standard error of estimate", or "root MSE") is computed as:

s_{\varepsilon}=\sqrt{\frac{RSS}{df_{Error}}}=\sqrt{\frac{RSS}{n^*-k}}

(10)

Note: Please read the ANOVA Tablefor more details about the degree of freedom, dfError.

and the residual sum of square (RSS) or sum of square error (SSE), which is actually the sum of the squares of the vertical deviations from each data point to the fitted line. It can be computed as:

RSS=\sum_{i=1}^n e_i=\sum_{i=1}^n w_i\left (y_i-\hat{y_i}\right )^2

(11)

t Value

If the regression assumptions hold, we can perform the t-tests for the regression coefficients with the null hypotheses and the alternative hypotheses:

\Eta _0: \beta_j = 0\,\!
\Eta _{\alpha}: \beta_j \neq 0

The t-values can be computed as:

t=\frac{{\hat{\beta_j}}-0}{s_{\hat{\beta_j}}}

(12)

With the t-value, we can decide whether or not to reject the corresponding null hypothesis. Usually, for a given confidence level α, we can reject H0 when | t | > tα / 2. Additionally, the p-value is less than α.

Prob>|t|

The probability that H0 in the t test above is true.

prob=2\left ( 1-tcdf(|t|,df_{Error})\right )

(13)

where tcdf(t, df) compute the lower tail probability for the Student't t distribution with df degree of freedom.

Confidence Intervals

From the t-value, we can calculate the (1 - α) * 100% confidence interval for each parameter by:

{\hat{\beta_j}-t_{(\alpha/2,n^*-k)}}{\varepsilon_{\hat{\beta_j}}}\le \hat{\beta_j} \le \hat{\beta_j}+t_{(\alpha/2,n^*-k)}{\varepsilon_{\hat{\beta_j}}}

(14)

CI Half Width

The Confidence Interval Half Width is:

CI=\frac{UCL-LCL}{2}

(15)

where UCL and LCL is the Upper Confidence Interval and Lower Confidence Interval, respectively.

Statistics

Some fit statistics formulas are summary here:

Image:Polynomial Regression Results2.png

Degree of Freedom

The Error degree of freedom. Please refer to the ANOVA Table for more details.

Residual Sum of Squares

The residual sum of squares, see formula (11).

R-Square (COD)

The goodness of fit can be evaluated by coefficient of determination, R2, which is given by:

R^2=\frac{Explained\, variation}{Total\, variation}=1-\frac{RSS}{TSS}

(16)

Adj. R-Square

The adjusted R2 is used to adjust the R2 value for the degree of freedom. It can be computed as:

\bar{R}^2=1-\frac{RSS/df_{Error}}{TSS/df_{Total}}

(17)

R Value

Then we can compute the R-value, which is simply the square root of R2

R=\sqrt{R^2}

(18)

Root-MSE (SD)

Root mean square of the error, which equals to:

Root\, MSE=\sqrt{\frac{RSS}{df_{Error}}}

(19)

Norm of Residuals

Equals to square root of RSS:

Norm\, of\,Residuals=\sqrt{RSS}

(20)

ANOVA Table

The ANOVA table of linear fitting is:

df Sum of Squares Mean Square F Value Prob > F
Model k SSreg = TSS - RSS MSreg = SSreg / k MSreg / MSE p-value
Error n* - k RSS MSE = RSS / (n* - k)
Total n* TSS
Note: If intercept is included in the model, n*=n-1. Otherwise, n*=n and the total sum of squares is uncorrected.

Where the total sum of square, TSS, is:

TSS=\sum_{k=1}^N w_i\left(y_i-\bar{y_i}\right)^2 (corrected)
TSS=\sum_{k=1}^N w_iy_i^2 (uncorrected)

(21)

Covariance and Correlation Matrix

The covariance matrix of the polynomial regression can be calculated as

Cov\left(\beta_i,\beta_j \right)=\sigma^2\left( X'X \right)^{-1}

(22)

The correlation between any two parameters is:

\rho\left(\beta_i,\beta_j \right)=\frac{Cov\left(\beta_i,\beta_j \right)}
{\sqrt{Cov\left(\beta_i,\beta_i \right)}\sqrt{Cov\left(\beta_j,\beta_j \right)}}

(23)

Confidence and Prediction Bands

Confidence Band

The confidence interval for the fitting function says how good your estimate of the value of the fitting function is at particular values of the independent variables. You can claim with 100α% confidence that the correct value for the fitting function lies within the confidence interval, where α is the desired level of confidence. This defined confidence interval for the fitting function is computed as:

\hat{Y}\pm t_{(\alpha/2,dof)}{\left [\chi^2\mathbf{x}\mathbf{C}\mathbf{x}' \right ]}^{\frac{1}{2}}

(24)

where

\mathbf{x}=\left [ \frac{\partial y}{\partial \beta_0},\frac{\partial y}{\partial \beta_1},\frac{\partial y}{\partial \beta_2},...,\frac{\partial y}{\partial \beta_k} \right ]

(25)

and C is the Covariance Matrix.

Prediction Band

The prediction interval for the desired confidence level ? is the interval within which 100α% of all the experimental points in a series of repeated measurements are expected to fall at particular values of the independent variables. This defined prediction interval for the fitting function is computed as?

\hat{Y}\pm t_{(\alpha/2,dof)}{\left [\chi^2(1+\mathbf{x}\mathbf{C}\mathbf{x}') \right ]}^{\frac{1}{2}}

(26)