Statistics in Analytical Chemistry: Part 46—R2 (Concluded)

Wednesday, February 8, 2012

Over the course of the past two articles (Part 44, Oct 2011, and Part 45, Nov/Dec 2011), R² has been defined, its components have been explained verbally and mathematically, and the statistic’s formula has been presented. Also included has been a discussion of the limitations of R², and the traps that lie in wait for those who rely exclusively or too heavily on this number. Is there a way to cast this often-used statistic in a more favorable light? This installment will address that question, as well as offer a more robust path for evaluating candidate models.

To review the bidding, the formula for R² is:

R² = SS_Model/SS_Total, or (1)
R² = 1 – (SS_Error/SS_Total) (2)

Recall, though, that SS_Error includes the random noise inherent in the data, as well as the variation the model fails to capture. Furthermore, the value of R² will increase (or remain unchanged) every time another term is added to the model. As was explained earlier, these two facts can lead the user down the primrose path.

Fortunately, there is a statistic known as R²_adj (where “adj” stands for “adjusted”); its value gives a more honest assessment of the model’s adequacy. Below are the details.

In a sentence, R²_adj includes a penalty for each term used in the regression. If the additional term(s) is not needed, R²_adj will generally decline relative to its value for the previous (simpler) model. Mathematically, R²_adj is a modification of Eq. (2) above; the new formula includes degrees-of-freedom (DOF) terms:

R²_adj = 1 – (MS_Error/MS_Total) (3)

where:

MS_Error = Mean Square Error = SS_Error/DOF_E
DOF_E = degrees of freedom for Mean Square Error
MS_Total = Mean Square Total = SS_Total/DOF_T
DOF_T = degrees of freedom for Mean Square Total

Two questions jump to mind. First, how are the DOF terms calculated? Second, why is “Mean” used to describe the adjusted “Square” terms? The answers are as follows.

In general, DOF terms for a statistic are computed by starting with the number of data points that are in the data set under discussion. Every time a calculation is made using this original set, a degree of freedom is lost for any statistic that depends on the calculation.

SS_Total is calculated using the entire set of raw responses; the total number of data points in a set is typically designated as n. However, to calculate SS_Total, one must first calculate the average of all the responses, thereby sacrificing a degree of freedom. (See Part 44 or Part 45 for the formulas for the SS terms.) Thus, the associated DOF term is (n-1).

For SS_Error, the starting point is the same as above. However, this time, a model must first be fitted to the data, since the predicted responses are needed in the calculation of this statistic. Each parameter (p) in a model includes a coefficient, which must be calculated; for example, a straight-line model requires the calculation of an intercept, as well as a coefficient for the x term, so a degree of freedom is lost for each parameter. As a result, the general expression for DOF_E is (n-p).

The use of “Mean Square” to describe the terms in R²_adj can be understood by thinking about what happens when one calculates the mean (i.e., average) of a set of data. The formula is the sum of all the values, divided by the total number of data points. In other words, the sum is divided by the degrees of freedom. In this case, no degrees of freedom were lost beforehand, since this determination is based solely on the original data. Thus, n is the appropriate DOF value. Since MS_Error and MS_Total also divide a sum by the associated DOF term, the use of “Mean” in the names is logical.

The stage is now set for deriving a more useful formula for R²_adj.

Incorporating the two DOF expressions into Eq. (3) results in the following:

R²_adj = 1 – [SS_Error/(n-p)]/[SS_Total/(n-1)] (4)

Regrouping yields:

R²_adj = 1 – [(SS_Error/SS_Total)] * [(n-1)/(n-p)] (5)

Rearranging Eq. (2) gives:

SS_Error/SS_Total = 1 – R² (6)

Combining Eqs. (5) and (6) yields:

R²_adj = 1 – {(1-R²) * [(n-1)/(n-p)]} (7)

In Eq. (7), the last expression, [(n-1)/(n-p)], can be considered a “penalty factor” that keeps R²_adj honest. In other words, the inclusion of DOF terms levels the playing field somewhat when different models are compared using R²_adj.

Keep in mind, though, that even R²_adj must be used with caution. Recall the example in Part 45. There, a data set was fitted with four different models: 1) quadratic, 2) cubic, 3) quartic, and 4) quadratic + phases-of-the-moon (POM) term. The values for R²_adj stack up as follows:

Quadratic 0.8735
Cubic 0.8688
Quartic 0.8652
Quadratic + POM 0.8703

The progression from the first through the third models is accompanied by a decrease in R²_adj, thereby signaling the inclusion of inappropriate terms. (Note that this comparison is for illustrative purposes; one is splitting hairs by looking at essentially the third decimal place of R²_adj.) Comparison of the POM-containing option with the cubic might lead the casual observer to think that he or she was getting somewhere, and that connecting a cubic with the moon might lead to victory!

It is time to turn to a more reliable (although more complex) alternative to either R² or R²_adj. The focus in model evaluation should be on the tools of: 1) the p-value for any term that was just added and 2) the residual plot and the related lack-of-fit (LOF) test. (For details related to this alternative, see Parts 9, 10, 22, and 23 of this series—American Laboratory, Feb 2004, Mar 2004, Jun/Jul 2006, and Oct 2006, respectively.)

First, if the p-value of the new term is insignificant (i.e., >0.01), then the term is not needed and its inclusion will result in overfitting. In the case of a straight line, the x-term is the line’s slope, which will typically increase in a plot of raw responses versus concentration, thereby being significant unless there is a major problem with the instrument.

Second, the residuals pattern will help the user determine if the model exhibits lack of fit; random scatter about the zero line suggests an adequate model. The LOF diagnostic is based on the residual values and does separate SS_Error into its parts. Thus, the door is open for distinguishing between random noise and “leftovers” from the model, and for producing a p-value that will reflect the sufficiency of the chosen curve.

Figure 1 - Residuals plots for a simulated data set fit with a) a straight-line and b) a quadratic model. See text for details.

The usefulness of this alternative approach can be shown by returning to the moon example. In Part 45, the scatterplot displayed curvature, so a quadratic was selected as the first candidate. The wisdom of this decision can be seen in the results of the LOF test; the p-values for a straight line and a quadratic were 0.0276 and 0.9363, respectively. The residual patterns in Figure1 agree with the LOF test. Furthermore, the p-value for the quadratic term is 0.0008, indicating that x² is needed in the model.

Addition of either an x³ or POM term results in insignificant p-values for the new member (0.8935 and 0.5729, respectively). Thus, there is confirmation that a quadratic model is adequate; inclusion of additional terms will result in overfitting.

One additional matter is the choice of fitting technique (see Part 8, Nov 2003, for details on this topic). Neither R² nor R²_adj can help here, either. The proper way to evaluate ordinary least squares versus weighted least squares is to model the standard deviation of the responses; if there is trending with concentration, then the latter technique is needed.

The final take-home message is that even R²_adj is not a strong tool for evaluating model selection or fitting-technique choice, and should be employed only when accompanied by a very large grain of salt. Instead, users should depend on residual patterns and the LOF test, and standard-deviation modeling, respectively. The authors cannot overemphasize the importance of these two emboldened statements!

David Coleman is an Applied Statistician, and Lynn Vanatta is an Analytical Chemist; e-mail:[email protected].

Statistics in Analytical Chemistry: Part 46—R² (Concluded)

Related Product Categories

Friction and Wear Testing Equipment »

Product Guide

Micromachined HPLC Columns Can Revolutionize Proteomics Workflows

Buyer's Guide: GC-MS for Monitoring Environmental Pollutants

Refocusing the Lens of Cancer Screening

Study: NGS Should Be Go-to Technology on Day Zero of the Next Pandemic

Overcoming Common Challenges in Sample Viability for Single-Cell Research