OriginLab Corporation - Scientific Graphing and Data Analysis Software - 2D graphs, 3D graphs, Contour Plots, Statistical Charts, Data Exploration, Statistics, Curve Fitting, Signal Processing, and Peak Analysis

Linear Regression Results

Contents

How Origin Fits a Line

The Fitting Model

For a given dataset (x_i,y_i),i=1,2,\ldots n -- where X is the independent variable and Y is the dependent variable -- linear regression fits the data to a model of the following form:

y=\beta _0+\beta _1x+\varepsilon

(1)

To fit the model, assume that the residuals

res_i=y_i-(\beta _0+\beta _1x)\,\!

(2)

conform to a normal (Gaussian) distribution with the mean equal to 0 and the variance equal to \sigma _i^2\,\!. Then the maximum likelihood estimates of the parameters \beta _0\,\! and \beta _1\,\! can be obtained by minimizing the chi-square value, defined as:

\chi ^2=\sum_{i=1}^n \frac 1{\sigma _i^2} (y_i-\hat y_i)^2

(3)

Weighted Fitting

If the error is treated as weight, the chi-square minimizing equation can be written as:

\chi ^2=\sum_{i=1}^n w_i (y_i-\hat y_i)^2=\sum_{i=1}^n w_i [y_i-(\beta _0+\beta _1x_i)]^2

(4)

and

 w_i=\frac 1{\sigma _i^2}

(5)

where \sigma _i\,\! are the measurement errors. If they are unknown, they should all be set to 1

Parameters

The fit related formulas are summary here:

Image:Linear Regression Results 01.png

The Fitted Values

When x2 is minimized, the estimated parameters of linear model can be computed as:

\hat\beta _1=\frac{SXY}{SXX}

(6)

\hat\beta _0=\bar y-\hat\beta _1\bar x

(7)

where:

\bar x=\frac {1}{n}\sum_{i=1}^nx_i,\bar y=\frac {1}{n}\sum_{i=1}^ny_i

(8)

and

SXY=\sum_{i=1}^n(x_i-\bar x)(y_i-\bar y),SXX=\sum_{i=1}^n(x_i-\bar x)^2 (corrected)

(9)

SXY=\sum_{i=1}^nx_iy_i,SXX=\sum_{i=1}^nx_i^2 (uncorrected)

(10)

Note: When the intercept is excluded from the model, the coefficients are calculated using the uncorrected formula.

The Parameter Standard Errors

For each parameter, the standard error can be obtained by:

\varepsilon _{\hat \beta _0}=s_\varepsilon \sqrt{\frac{\sum x_i^2}{nSXX}}

(11)

\varepsilon _{\hat \beta _1}=\frac{s_\varepsilon }{\sqrt{SXX}}

(12)

where the sample variance s_\varepsilon ^2 can be estimated as follows:

s_\varepsilon ^2=\frac{RSS}{df_{Error}}=\frac{\sum_{i=1}^n (y_i-\hat y_i)^2}{n^{*}-1}

(13)

And RSS means the residual sum of square (or error sum of square, SSE), which is actually the sum of the squares of the vertical deviations from each data point to the fitted line. It can be computed as:

RSS=\sum_{i=1}^n e_i=\sum_{i=1}^n w_i (y_i-\hat y_i)^2=\sum_{i=1}^n w_i [y_i-(\beta _0+\beta _1x_i)]^2

(14)

Note : Regarding n * , if intercept is included in the model, n * = n - 1. Otherwise, n * = n.

t Value

If the regression assumptions hold, we have:

\frac{{\hat \beta _0}-\beta _0}{\varepsilon _{\hat \beta _0}}\sim t_{n^{*}-1} and \frac{{\hat \beta _1}-\beta _1}{\varepsilon _{\hat \beta _1}}\sim t_{n^{*}-1}

(15)

The t-test can be used to examine whether the fitting parameters are significantly different from zero, which means that we can test whether \beta _0= 0\,\! (if true, this means that the fitted line passes through the origin) or \beta _1= 0\,\!. The hypotheses of the t-tests are:

H_0 : \beta _0= 0\,\! H_0 : \beta _1= 0\,\!
H_\alpha  : \beta _0  \neq 0\,\! H_\alpha  : \beta _1 \neq  0\,\!

The t-values can be computed by:

t=\frac{{\hat \beta _0}-0}{\varepsilon _{\hat \beta _0}} and t=\frac{{\hat \beta _1}-0}{\varepsilon _{\hat \beta _1}}

(16)

With the computed t-value, we can decide whether or not to reject the corresponding null hypothesis. Usually, for a given confidence level \alpha\,\! , we can reject H_0 \,\! when |t|>t_{\frac \alpha 2}. Additionally, the p-value, or significance level, is reported with a t-test. We also reject the null hypothesis H_0 \,\! if the p-value is less than \alpha\,\! .

Prob>|t|

The probability that H_0 \,\! in the t test above is true.

prob=2(1-tcdf(|t|,df_{Error}))\,\!

(17)

where tcdf(t, df) compute the lower tail probability for the Student't t distribution with df degree of freedom.

Confidence Intervals

From the t-value, we can calculate the (1-\alpha )\times 100\% confidence interval for the intercept

\hat \beta _0-t_{(\frac \alpha 2,n^{*}-1)}\varepsilon _{\hat \beta _0}\leq \beta _0\leq \hat \beta _0+t_{(\frac \alpha 2,n^{*}-1)}\varepsilon _{\hat \beta _0}

(18)

And the (1-\alpha )\times 100\% confidence interval for the slope is:

\hat \beta _1-t_{(\frac \alpha 2,n^{*}-1)}\varepsilon _{\hat \beta _1}\leq \beta _1\leq \hat \beta _1+t_{(\frac \alpha 2,n^{*}-1)}\varepsilon _{\hat \beta _1}

(19)

CI Half Width

The Confidence Interval Half Width is:

CI=\frac{UCL-LCL}2

(20)

where UCL and LCL is the Upper Confidence Interval and Lower Confidence Interval, respectively.

Statistics

Some fit statistics formulas are summary here:

Image:Linear Regression Results 02.png

Degree of Freedom

The Error degree of freedom. Please refer to the ANOVA table for more details.

Residual Sum of Squares

The residual sum of squares, see formula (14).

R-Square (COD)

The quality of linear regression can be measured by the coefficient of determination (COD), or R2, which can be computed as:

R^2=\frac{SXY}{SXX*TSS}=1-\frac{RSS}{TSS}

(21)

where TSS is the total sum of square (Formula . & .), and RSS is the residual sum of square. The R2 is a value between 0 and 1. Generally speaking, if it is close to 1, the relationship between X and Y will be regarded as very strong and we can have a high degree of confidence in our regression model.

Adj. R-Square

We can further calculate the adjusted R2 as

{\bar R}^2=1-\frac{RSS/df_{Error}}{TSS/df_{Total}}

(22)

R Value

The R value is the square root of R2:

R=\sqrt{R^2}

(23)

Pearson's r

In simple linear regression, the correlation coefficient between x and y, denoted by r, equals to:

r=R\,\! if \beta _1\,\! is positive

r=-R\,\! if \beta _1\,\! is negative

(24)

Root-MSE (SD)

Root mean square of the error, which equals to:

RootMSE=\sqrt{\frac{RSS}{df_{Error}}}

(25)

Norm of Residuals

Equals to square root of RSS:

Norm \,of \,Residuals=\sqrt{RSS}

(26)

ANOVA Table

The ANOVA table of linear fitting is:

df Sum of Squares Mean Square F Value Prob > F
Model 1 SSreg = TSS - RSS MSreg = SSreg / 1 MSreg / MSE p-value
Error n* - 1 RSS MSE = RSS / (n* - 1)
Total n* TSS
Note: If intercept is included in the model, n*=n-1. Otherwise, n*=n and the total sum of squares is uncorrected. If the slope is fixed, dfModel = 0.

Where the total sum of square, TSS, is:

TSS=\sum_{i=1}^n w_i(y_i-\bar y)^2 (corrected)
TSS=\sum_{i=1}^n w_iy_i^2 (uncorrected)

(27)

Covariance and Correlation Matrix

The Covariance matrix of linear regression is calculated by:


\begin{pmatrix}
Cov(\beta _0,\beta _0) & Cov(\beta _0,\beta _1)\\
Cov(\beta _1,\beta _0) & Cov(\beta _1,\beta _1)
\end{pmatrix}=\sigma ^2\frac 1{SXX}\begin{pmatrix} \sum \frac{x_i^2}n & -\bar x \\-\bar x & 1 \end{pmatrix}

(28)

The correlation between any two parameters is:


\rho (\beta _i,\beta _j)=\frac{Cov(\beta _i,\beta _j)}{\sqrt{Cov(\beta _i,\beta _i)}\sqrt{Cov(\beta _j,\beta _j)}}

(29)

Confidence and Prediction Band

For a particular value x_p\,\!, the 100(1-\alpha )\% confidence interval for the mean value of y\,\! at x=x_p\,\!is:

\hat y\pm t_{(\frac \alpha 2,n^{*}-1)}s_\varepsilon \sqrt{\frac 1n+\frac{(x_p-\bar x)^2}{SXX}}

(30)

And the 100(1-\alpha )\% prediction interval for the mean value of y\,\! at x=x_p\,\!is:

\hat y\pm t_{(\frac \alpha 2,n^{*}-1)}s_\varepsilon \sqrt{1+\frac 1n+\frac{(x_p-\bar x)^2}{SXX}}

(31)

Confidence Ellipses

Assuming the pair of variables (X, Y) conforms to a bivariate normal distribution, we can examine the correlation between the two variables using a confidence ellipse. The confidence ellipse is centered at (\bar x,\bar y ), and the major semiaxis a and minor semiaxis b equal:

 a=c\sqrt{\frac{\sigma _x^2+\sigma _y^2+\sqrt{(\sigma _x^2-\sigma _y^2)+4r^2\sigma _x^2\sigma _y^2}}2}
 b=c\sqrt{\frac{\sigma _x^2+\sigma _y^2-\sqrt{(\sigma _x^2-\sigma _y^2)+4r^2\sigma _x^2\sigma _y^2}}2}

(32)

For a given confidence level of  (1-\alpha )\,\! :

 c=\sqrt{\frac{2(n-1)}{n(n-2)}(\alpha ^{\frac 2{2-n}}-1)}

(33)

 c=\sqrt{\frac{2(n+1)(n-1)}{n(n-2)}(\alpha ^{\frac 2{2-n}}-1)}

(34)

\beta =\frac 12\arctan \frac{2r\sqrt{\sigma _x^2\sigma _y^2}}{\sigma _x^2-\sigma _y^2}

(35)