OriginLab Corporation - Scientific Graphing and Data Analysis Software - 2D graphs, 3D graphs, Contour Plots, Statistical Charts, Data Exploration, Statistics, Curve Fitting, Signal Processing, and Peak Analysis

The Fit Results

Contents

How Origin Fits the Curve

The Fitting Model

A general nonlinear model can be expressed as follows:

Y=f(X, \boldsymbol{\theta})+\varepsilon (1)

where X = (x_1, x_2, \cdots , x_k)' is the independent variables and \boldsymbol{\theta} = (\theta_1, \theta_2, \cdots , \theta_k)' is the parameters.

The aim of nonlinear fitting is to estimate the parameter values which best describe the data. The standard way of finding the best fit is to choose the parameters that would minimize the deviations of the theoretical curve(s) from the experimental points. This method is also called chi-square minimization, defined as follows:

\chi ^2=\sum_{i=1}^n \left [ \frac{Y_i-f(x_i^{\prime },\hat{\theta }) } {\sigma _i} \right ]^2 (2)

where x_i^{\prime } is the row vector for the ith (i = 1, 2, ... , n) observation.

To estimate the Image:temp-Regression and Curve Fitting-image124.gif value with the least square method, we need to solve the normal equations which are set to be zero for the partial derivatives of Image:temp-Regression and Curve Fitting-image125.gif with respect to each Image:temp-Regression and Curve Fitting-image126.gif.

\frac{\partial \chi ^2}{\partial \hat \theta _p}=-2\sum_{i=1}^n\frac 1{\sigma _i^2}[Y_i-f(x_i^{\prime },\hat \theta )][\frac{\partial f(x_i^{\prime },\hat \theta )}{\partial \hat \theta _p}]=0

(3)

Since there are no explicit solutions to the normal equations, we employ an iterative strategy to estimate the parameter values. This process starts with some initial values, Image:temp-Regression and Curve Fitting-image128.gif. With each iteration, a Image:temp-Regression and Curve Fitting-image125.gif value is computed and then the parameter values are adjusted to reduce the Image:temp-Regression and Curve Fitting-image125.gif. When the Image:temp-Regression and Curve Fitting-image125.gif values computed in two successive iterations are small enough (compared with the tolerance), we can say that the fitting procedure has converged. In the NLFit output messages, you can see the reduced chi-square, which is the mean deviation for all data points, as shown below:

reduced\,\chi ^2=\frac{\chi ^2}{dof}=\frac{\chi ^2}{n-p}

(4)

Origin uses the Levenberg-Marquardt (L-M) algorithm to adjust the parameter values in the iterative procedure. This algorithm, which combines the Gauss-Newton method and the steepest descent method, works for most cases. You may wish to consult other sources for details of the L-M algorithm. Origin's fitter additionally offers the Simplex method.

Weighted Fitting

When the measurement errors are unknown, \sigma _i\,\! are set to 1 for all i, and the curve fitting is performed without weighting. However, when the experimental errors are known, we can treat these errors as weights and use weighted fitting. In this case, the chi-square can be written as:

\chi ^2=\sum_{i=1}^nw_i[Y_i-f(x_i^{\prime },\hat \theta )]^2

(5)

There are a number of weighting methods available in Origin. Please read Fitting with Errors and Weighting in the Origin Help file for more details.

Parameters

The fit-related formulas are summarized here:

Image:The Fit Results.png

The Fitted Value

Computing the fitted values in nonlinear regression is an iterative procedure. You can read a brief introduction in the above section (How Origin Fits the Curve), or see the below-referenced material for more detailed information.

Parameter Standard Errors

During L-M iteration, we need to calculate the partial derivatives matrix F, whose element in ith row and jth column is:

F_{ij}=\frac{\partial f(x,\theta )}{\sigma _i^2\partial \theta _j}

(6)

Then we can get the Variance-Covariance Matrix for parameters Image:temp-Regression and Curve Fitting-image124.gif by:

C=(F'F)^{-1}s^2\,\!

(7)

where s2 is the mean residual variance, or the Deviation of the Model, and can be calculated as follows:

s^2=\frac{RSS}{n-p}

(8)

The square root of a main diagonal value of this matrix is the Standard Error of the corresponding parameter

s_{\theta _i}=\sqrt{c_{ii}}\,\!

(9)

where Cii is the element in ith row and ith column of the matrix C. Cij is the covariance between θi and θj.

You can choose whether to exclude s2 when calculating the covariance matrix. This will affect the Standard Error values. When excluding s2, clear the Use reduce Chi-Sqr check box on the Advanced page. The covariance is then calculated by:

c=(F'F)^{-1}\,\!

(10)

So the Standard Error now becomes:

s_{\theta _i}^{\prime }=\frac{s_{\theta _i}}s\,\!

(11)

The parameter standard errors can give us an idea of the precision of the fitted values. Typically, the magnitude of the standard error values should be lower than the fitted values. If the standard error values are much greater than the fitted values, the fitting model may be overparameterized.

The Standard Error for Derived Parameter

Origin estimates the standard errors for the derived parameters according to the Error Propagation formula, which is an approximate formula.

Let z = f\left (\theta _1, \theta _2, ..., \theta _p \right ) be the function with a combination (linear or non-linear) of p\, variables \theta _1, \theta _2, ..., \theta _p \,.

The general law of error propagation is:

\sigma_z^2 = \sum_i^p \sum_j^p \frac {\partial z}{\partial \theta_i} COV_{\theta_i \theta_j} \frac {\partial z}{\partial \theta_j}

where COV_{\theta_i \theta_j}\, is the covariance value for \left (\theta_i, \theta_j \right ), and \left (i = 1, 2, ..., p \right ), \left (j = 1, 2, ..., p \right ).

For example, using three variables

z = f\left (\theta_1, \theta_2, \theta_3 \right )

we get:

\sigma_z^2 = \left (\frac {\partial z}{\partial \theta_1} \right )^2 \sigma_{\theta_1}^2 + \left (\frac {\partial z}{\partial \theta_2} \right )^2 \sigma_{\theta_2}^2 + \left (\frac {\partial z}{\partial \theta_3} \right )^2 \sigma_{\theta_3}^2 + 2 \left (\frac {\partial z}{\partial \theta_1} \frac {\partial z}{\partial \theta_2} \right ) COV_{\theta_1 \theta_2} + 2 \left (\frac {\partial z}{\partial \theta_1} \frac {\partial z}{\partial \theta_3} \right ) COV_{\theta_1 \theta_3} + 2 \left (\frac {\partial z}{\partial \theta_2} \frac {\partial z}{\partial \theta_3} \right ) COV_{\theta_2 \theta_3}


Now, let the derived parameter be z\,, and let the fitting parameters be \theta_1, \theta_2, ..., \theta_p\,. The standard error for the derived parameter z\, is \sigma_z\,.

Confidence Intervals

One assumption in regression analysis is that data is normally distributed, so we can use the standard error values to construct the Parameter Confidence Intervals. For a given significance level, α, the (1-α)x100% confidence interval for the parameter is:

\hat \theta _j-t_{(\frac \alpha 2,n-p)}s_{\theta _j}\leq \hat \theta _j\leq \hat \theta _j+t_{(\frac \alpha 2,n-p)}s_{\theta _j}

(12)

The parameter confidence interval indicates how likely the interval is to contain the true value.

The confidence interval illustrated above is Asymptotic, which is the most frequently used method to calculate the confidence interval. The "Asymptotic" here means it is an approximate value. If you need more accurate values, you can use the Model Comparison Based method to estimate the confidence interval in the Advanced page.

If the Model Comparison method is used, the upper and lower confidence limits will be calculated by searching for the values of each parameter p that makes RSS(θj) (minimized over the remaining parameters) greater than RSS by a factor of (1+F/(n-p)).

RSS(\theta _j)=RSS(1+F\frac p{n-p})

(13)

where F = Ftable(α,1,n-p)and RSS is the minimum residual sum of square found during the fitting session.

t Value

You can choose to perform a t-test on each parameter to see whether its value is equal to 0. The null hypothesis of the t-test on the jth parameter is:

H_0: \theta_j = 0 \,

And the alternative hypothesis is:

H_\alpha : \theta_j \ne 0

The t-value can be computed as:

t=\frac{\hat \beta _j-0}{s_{\hat \beta _j}}

(14)

Prob>|t|

The probability that H0 in the t test above is true.

prob=2(1-tcdf(|t|,df_{Error}))\,\!

(15)

where tcdf(t, df) computes the lower tail probability for Student's t distribution with df degree of freedom.

Dependency

If the equation is overparameterized, there will be mutual dependency between parameters. The dependency for the ith parameter is defined as:

1-\frac 1{c_{ii}(c^{-1})_{ii}}

(16)

and (C-1)ii is the (i, i)th diagonal element of the inverse of matrix C. If this value is close to 1, there is strong dependency.

CI Half Width

The Confidence Interval Half Width is:

CI=\frac{UCL-LCL}2

(17)

where UCL and LCL is the Upper Confidence Interval and Lower Confidence Interval, respectively.

Statistics

Several fit statistics formulas are summarized below:

Image:The Fit Results 02.png

Degree of Freedom

The Error degree of freedom. Please refer to the ANOVA Table for more details.

Residual Sum of Squares

The residual sum of squares:

RSS(X,\hat \theta )=\sum_{i=1}^n w_i[Y_i-f(x_i^{\prime },\hat \theta )]^2

(18)

Reduced Chi-Sqr

The Reduced Chi-square value, which equals the residual sum of square divided by the degree of freedom.

Reduced\chi ^2=\frac{\chi ^2}{df_{Error}}=\frac{RSS}{df_{Error}}

(19)

R-Square (COD)

The R2 value shows the goodness of a fit, and can be computed by:

R^2=\frac{Explained\,variation}{Total\,variation}=\frac{TSS-RSS}{TSS}=1-\frac{RSS}{TSS}

(20)

where TSS is the total sum of square, and RSS is the residual sum of square.

Adj. R-Square

The adjusted R2 value:

\bar R^2=1-\frac{RSS/df_{Error}}{TSS/df_{Error}}

(21)

R Value

The R value is the square root of R2:

R=\sqrt{R^2}

(22)

For more information on R2, adjusted R2 and R, please see Goodness of Fit.

Root-MSE (SD)

Root mean square of the error, or the Standard Deviation of the model, equal to the square root of reduced χ2:

Root\,MSE=\sqrt{Reduced \,\chi ^2}

(23)

ANOVA Table

The ANOVA Table:

df Sum of Squares Mean Square F Value Prob > F
Model

p

SSreg = TSS - RSS

MSreg = SSreg / p

MSreg / MSE

p-value

Error

n - p

RSS

MSE = RSS / (n - p)

Uncorrected Total

n

TSS

Corrected Total

n-1

TSScorrected

Note: In nonlinear fitting, Origin outputs both corrected and uncorrected total sum of squares:

Corrected model:

TSS_{corrected}=\sum_{i=1}^nw_i(y_i-\bar y)^2

(24)

Uncorrected model:

TSS=\sum_{i=1}^nw_iy_i^2

(25)

Confidence and Prediction Bands

Confidence Band

The confidence interval for the fitting function says how good your estimate of the value of the fitting function is at particular values of the independent variables. You can claim with 100α% confidence that the correct value for the fitting function lies within the confidence interval, where α is the desired level of confidence. This defined confidence interval for the fitting function is computed as:

f(x_{1i},x_{2i},\ldots ;\theta _{1i},\theta _{2i},\ldots )\pm t_{(\frac \alpha 2,dof)}[\chi ^2fcf^{\prime }]^{\frac 12}

(26)

where:

f=[\frac{\partial f}{\partial \theta _1},\frac{\partial f}{\partial \theta _2},\cdots ,\frac{\partial f}{\partial \theta _p}]

(27)

Prediction Band

The prediction interval for the desired confidence level α is the interval within which 100α% of all the experimental points in a series of repeated measurements are expected to fall at particular values of the independent variables. This defined prediction interval for the fitting function is computed as:

f(x_{1i},x_{2i},\ldots ;\theta _{1i},\theta _{2i},\ldots )\pm t_{(\frac \alpha 2,dof)}[\chi ^2(1+fcf^{\prime })]^{\frac 12}

(28)

Reference

  1. William. H. Press, etc. Numerical Recipes in C++. Cambridge University Press, 2002.
  2. Norman R. Draper, Harry Smith. Applied Regression Analysis, Third Edition. John Wiley & Sons, Inc. 1998.
  3. George Casella, et al. Applied Regression Analysis: A Research Tool, Second Edition. Springer-Verlag New York, Inc. 1998.
  4. G. A. F. Seber, C. J. Wild. Nonlinear Regression. John Wiley & Sons, Inc. 2003.
  5. David A. Ratkowsky. Handbook of Nonlinear Regression Models. Marcel Dekker, Inc. 1990.
  6. Douglas M. Bates, Donald G. Watts. Nonlinear Regression Analysis & Its Applications. John Wiley & Sons, Inc. 1988.
  7. Marko Ledvij. Curve Fitting Made Easy. The Industrial Physicist. Apr./May 2003. 9:24-27.