# Chromatography and Linear Regression: An Inseparable Pair

Chromatographs are everywhere; many analytical labs may now contain more such instruments than analytical balances. Indeed, gas chromatographs and liquid chromatographs are used to solve separation challenges in almost every field of endeavor. However, unless production-scale work is being done, the separation is typically not an end in itself. With very few exceptions, customers are interested in knowing how much of an analyte(s) is present in a given sample. It is time to enter the world of statistics, especially simple linear regression.

Unlike instruments such as balances, GCs and LCs do not provide raw data in user-friendly units; peak heights or peak area are the norm. Thus, the analyst must calibrate the instrument externally, by preparing standard solutions of known concentration and analyzing them. A calibration curve can then be generated to link the responses to the original, known concentrations.

This calibration process often requires the use of simple linear regression, a statistical technique that can be performed easily by appropriate software packages. However, the user needs to understand the underpinnings of the process, or he or she may end up reporting numbers of uncertain quality.

What to do if you operate such instrumentation? The following suggestions are offered for consideration.

Learn the basics of how to: 1) design a study (calibration or recovery), 2) select a fitting technique (e.g., ordinary least squares) and a model (e.g., y = a + bx) to explain the regression data, and 3) determine the uncertainty associated with any resulting curve.

Realize that the last step (knowing this uncertainty) is often ignored, but is all-important for the results to be useful. If the noise level is high, then the ability to discriminate between two pieces of data may be very low. Having a sound estimate of the uncertainty will allow you to make intelligent decisions.

Since there is always variability in data, remember that you (or your customer) must decide how confident you want to be in a reported result; frequent choices are 95% or 99% confidence, but the needs of your specific project will be the determining factor. Statistics cannot make this decision for you. Thus, if someone asks, “How good a number can you give me?” reply with, “How confident would you like to be?”

Even if regulations require you to calculate and report results a certain way, and these procedures are not as statistically sound as they should be, collect enough data (e.g., replicates at each of multiple concentrations) to be able to draw sound conclusions. Otherwise, you may not be able to defend your results if controversial values arise. He who has reliable data has power.

Realize that collecting enough data for sound statistical analyses need not be overly complicated or expensive. Typically, such results can be collected via routine, in-place quality-control (QC) procedures.

Do not allow extrapolation of a regression curve. You simply do not know how the data are behaving in regions where no results have been obtained and modeled. The only exception is when trace-level work is being done and a true blank cannot be obtained; here, the only alternative may be the use of some version of standard addition, which inherently involves extrapolation. In such cases, proceed with caution, and with your eyes open.

Be aware that obtaining “perfect” chromatography may be overkill. If a less-than-ideal separation can deliver results with an acceptable level of uncertainty, then don’t fret over the lack of perfection (unless a regulation contains such requirements as meeting peak-shape criteria). Remember, (x ± 30%x) may be sufficient precision.

When trace analysis is involved, try to steer clear of detection limits (DLs) and focus instead on the overall uncertainty associated with results. Make decisions according to how acceptable the uncertainty is for the situation at hand.

If you must deal with detection limits, understand the relationship among: 1) the detection limit, 2) the probability of false positives, and 3) the probability of false negatives. The mathematical expression is one equation in three unknowns, meaning that you cannot select values for all three variables. Basic algebra dictates that you can only specify any two of the three; the equation will determine the remaining value. Thus, if you select an unrealistically low DL, then the value for at least one of the two probabilities must rise to maintain the equality. The result may not be pretty.

Don’t get trapped into thinking that detection-limit estimates are necessary for every procedure. If your samples will always contain high levels of the analyte(s) (i.e., instrument sensitivity is not an issue), then such limits are irrelevant.

Do everything you can to educate upper management about the need for statistically sound results. True support from these folks is needed to instill a culture of sound data. To win your case, emphasize that not obtaining such data can be very expensive in the long run; mention the “free” option of using available QC results, as discussed above.

Along the same lines, urge instrument manufacturers to incorporate sound statistical analysis into their software. While excellent standalone programs exist for performing these types of calculations, life is much simpler if both data collection and data processing can be done within a single package.

If you are associated with the chemistry department of a university, work to establish an ongoing class or workshop in sound regression analysis; education is the key. As you get to know faculty members who teach analytical chemistry, encourage them to become proficient in the subject and then to use the techniques in their research.

Even if you only receive results from chromatographs, learn the basics of simple linear regression. Then “interview” candidate testing labs (or your in-house lab) to be sure they understand (and will deliver) sound results. After all, it’s your money! (You wouldn’t shop for a house without asking for details about each possible property.)

If by now you are asking, “Where do I go for specifics on regression analysis in chemistry?” you might try the following references. Statistics for Analytical Chemists by Caulcutt and Boddy includes a useful discussion of the subject; unfortunately, the book is out of print, so searching for a used copy would be the route to take. Statistical Methods in Analytical Chemistry by Meier and Zund also contains material on regression, but this book is fairly heavy-duty. Since its beginnings in September 2002, the ongoing American Laboratory column, “Statistics in Analytical Chemistry” (http://new.americanlaboratory.com/1403-Statistics-in-Analytical-Chemistry/), has been devoted to the details of simple linear regression (and related topics). These articles are the authors’ gratis contribution to help fulfill their mission: providing scientists with easy access to sound statistical procedures.

If you have any doubts about how to perform a statistical analysis, consult a statistician who is knowledgeable in the applicable field. Grabbing a formula that looks correct (i.e., has all of the necessary terms) does not guarantee a sound result.

Finally, in both chromatography and statistics (and all of science, for that matter), avoid the “black-box syndrome,” which can be explained with the following true story. Many years ago, a group of chromatographers were discussing all of the automatic features that are standard in modern instrumentation and software. The overall atmosphere was one of satisfaction that such equipment was now very user-friendly and accessible even to the novice, requiring little hands-on attention. Larry Taylor (at the time, Professor of Chemistry at Virginia Tech, and now Professor Emeritus) was in attendance and had been listening silently throughout. As the discussion began to die down, he spoke up. “Yes,” he said, “but you still have to think.”

The message is clear. Do not accept chromatographs or their data at face value. Just because something is new, expensive, and computer-controlled does not mean that it is necessarily behaving the way you think it is. Chromatographs and their accessories are highly reliable, typically stable pieces of laboratory equipment. However, you need to keep tabs on everything; the gnomes may have been tinkering during the manufacturing process or while you were running samples. The data that these instruments generate can be sound as well, but only if proper statistical techniques are understood and used.

Lynn Vanatta is an Analytical Chemist and long-time contributor of the Statistics in Analytical Chemistry articles with co-author David Coleman. She can be reached at statistics@americanlaboratory.com.