Over the course of the past two articles (Part 44, Oct 2011, and Part 45, Nov/Dec 2011), R^{2} has been defined, its components have been explained verbally and mathematically, and the statistic’s formula has been presented. Also included has been a discussion of the limitations of R^{2}, and the traps that lie in wait for those who rely exclusively or too heavily on this number. Is there a way to cast this often-used statistic in a more favorable light? This installment will address that question, as well as offer a more robust path for evaluating candidate models.

To review the bidding, the formula for R^{2} is:

R^{2} = SS_{Model}/SS_{Total}, or (1)

R^{2} = 1 – (SS_{Error}/SS_{Total}) (2)

Recall, though, that SS_{Error} includes the random noise inherent in the data, as well as the variation the model fails to capture. Furthermore, the value of R^{2} will increase (or remain unchanged) every time another term is added to the model. As was explained earlier, these two facts can lead the user down the primrose path.

Fortunately, there is a statistic known as R^{2}_{adj} (where “adj” stands for “adjusted”); its value gives a more honest assessment of the model’s adequacy. Below are the details.

In a sentence, R^{2}_{ adj} includes a penalty for each term used in the regression. If the additional term(s) is not needed, R^{2}_{adj} will generally decline relative to its value for the previous (simpler) model. Mathematically, R^{2}_{ adj} is a modification of Eq. (2) above; the new formula includes degrees-of-freedom (DOF) terms:

R^{2}_{adj} = 1 – (MS_{Error}/MS_{Total}) (3)

where:

MS_{Error} = Mean Square Error = SS_{Error}/DOF_{E}

DOF_{E} = degrees of freedom for Mean Square Error

MS_{Total} = Mean Square Total = SS_{Total}/DOF_{T}

DOF_{T} = degrees of freedom for Mean Square Total

Two questions jump to mind. First, how are the DOF terms calculated? Second, why is “Mean” used to describe the adjusted “Square” terms? The answers are as follows.

In general, DOF terms for a statistic are computed by starting with the number of data points that are in the data set *under discussion*. Every time a calculation is made using this original set, a degree of freedom is lost for any statistic that depends on the calculation.

SS_{Total} is calculated using the entire set of raw responses; the total number of data points in a set is typically designated as *n*. However, to calculate SS_{Total}, one must first calculate the average of all the responses, thereby sacrificing a degree of freedom. (See Part 44 or Part 45 for the formulas for the SS terms.) Thus, the associated DOF term is (*n*-1).

For SS_{Error}, the starting point is the same as above. However, this time, a model must first be fitted to the data, since the predicted responses are needed in the calculation of this statistic. Each parameter (*p*) in a model includes a coefficient, which must be calculated; for example, a straight-line model requires the calculation of an intercept, as well as a coefficient for the *x* term, so a degree of freedom is lost for each parameter. As a result, the general expression for DOF_{E} is (*n-p*).

The use of “Mean Square” to describe the terms in R^{2}_{adj} can be understood by thinking about what happens when one calculates the mean (i.e., average) of a set of data. The formula is the sum of all the values, divided by the total number of data points. In other words, the sum is divided by the degrees of freedom. In this case, no degrees of freedom were lost beforehand, since this determination is based solely on the original data. Thus, n is the appropriate DOF value. Since MS_{Error} and MS_{Total} also divide a sum by the associated DOF term, the use of “Mean” in the names is logical.

The stage is now set for deriving a more useful formula for R^{2}_{adj}.

Incorporating the two DOF expressions into Eq. (3) results in the following:

R^{2}_{adj} = 1 – [SS_{Error}/(*n-p*)]/[SS_{Total}/(*n-1*)] (4)

Regrouping yields:

R^{2}_{adj} = 1 – [(SS_{Error}/SS_{Total})] * [(*n-1*)/(*n-p)*] (5)

Rearranging Eq. (2) gives:

SS_{Error}/SS_{Total} = 1 – R^{2} (6)

Combining Eqs. (5) and (6) yields:

R^{2}_{adj} = 1 – {(1-R^{2}) * [(*n-1*)/(*n-p*)]} (7)

In Eq. (7), the last expression, [(*n-1*)/(*n-p*)], can be considered a “penalty factor” that keeps R^{2}_{adj} honest. In other words, the inclusion of DOF terms levels the playing field somewhat when different models are compared using R^{2}_{adj}.

Keep in mind, though, that even R^{2}_{adj} must be used with caution. Recall the example in Part 45. There, a data set was fitted with four different models: 1) quadratic, 2) cubic, 3) quartic, and 4) quadratic + phases-of-the-moon (POM) term. The values for R^{2}_{adj} stack up as follows:

Quadratic 0.8735

Cubic 0.8688

Quartic 0.8652

Quadratic + POM 0.8703

The progression from the first through the third models is accompanied by a decrease in R^{2}_{adj}, thereby signaling the inclusion of inappropriate terms. (Note that this comparison is for illustrative purposes; one is splitting hairs by looking at essentially the third decimal place of R^{2}_{adj}.) Comparison of the POM-containing option with the cubic might lead the casual observer to think that he or she was getting somewhere, and that connecting a cubic with the moon might lead to victory!

It is time to turn to a more reliable (although more complex) alternative to either R^{2} or R^{2}_{adj}. The focus in model evaluation should be on the tools of: 1) the* p*-value for any term that was just added and 2) the residual plot and the related lack-of-fit (LOF) test. (For details related to this alternative, see Parts 9, 10, 22, and 23 of this series—*American Laboratory*, Feb 2004, Mar 2004, Jun/Jul 2006, and Oct 2006, respectively.)

First, if the* p*-value of the new term is insignificant (i.e., >0.01), then the term is not needed and its inclusion will result in overfitting. In the case of a straight line, the* x*-term is the line’s slope, which will typically increase in a plot of raw responses versus concentration, thereby being significant unless there is a major problem with the instrument.

Second, the residuals pattern will help the user determine if the model exhibits lack of fit; random scatter about the zero line suggests an adequate model. The LOF diagnostic is based on the residual values and does separate SS_{Error} into its parts. Thus, the door is open for distinguishing between random noise and “leftovers” from the model, and for producing a* p*-value that will reflect the sufficiency of the chosen curve.

*Figure 1 - Residuals plots for a simulated data set fit with a) a straight-line and b) a quadratic model. See text for details.*

The usefulness of this alternative approach can be shown by returning to the moon example. In Part 45, the scatterplot displayed curvature, so a quadratic was selected as the first candidate. The wisdom of this decision can be seen in the results of the LOF test; the *p*-values for a straight line and a quadratic were 0.0276 and 0.9363, respectively. The residual patterns in *Figure**1* agree with the LOF test. Furthermore, the p-value for the quadratic term is 0.0008, indicating that *x*^{2} is needed in the model.

Addition of either an *x*^{3} or POM term results in insignificant *p*-values for the new member (0.8935 and 0.5729, respectively). Thus, there is confirmation that a quadratic model is adequate; inclusion of additional terms will result in overfitting.

One additional matter is the choice of fitting technique (see Part 8, Nov 2003, for details on this topic). Neither R^{2} nor R^{2}_{adj} can help here, either. The proper way to evaluate ordinary least squares versus weighted least squares is to model the standard deviation of the responses; if there is trending with concentration, then the latter technique is needed.

**The final take-home message is that even R ^{2}_{adj} is not a strong tool for evaluating model selection or fitting-technique choice, and should be employed only when accompanied by a very large grain of salt. Instead, users should depend on residual patterns and the LOF test, and standard-deviation modeling, respectively**. The authors cannot overemphasize the importance of these two emboldened statements!

*David Coleman is an Applied Statistician, and Lynn Vanatta is an Analytical Chemist; e-mail:*statistics@americanlaboratory.com.