# Statistics in Analytical Chemistry: Part 50—Sampling Theory

Throughout this series of articles, the emphasis has been on calibration and recovery curves, which are necessary for the quantitative analysis of samples via analytical methods that involved simple linear regression. Once these plots are available, the analyst will be able to conduct measurements on actual samples. Sometimes, the laboratory personnel are also involved with the collection of such specimens. However, almost always, analysts must handle the samples to prepare them for introduction into the instrument.

From a chemical and physical standpoint, much has been written about proper techniques to preserve the integrity of the individual specimens. Does statistics have anything to say about sample handling? The answer is “Yes.” A brief introduction into this statistical theory is appropriate, to give some guidance on how statistically sound collection and handling of samples can be effected.

The “father” of statistical sampling is a Frenchman named Pierre Gy. Beginning in the 1950s, he developed and published his structured sampling approach, which centers on seven basic sources of sampling variation. In 2001, Patricia Smith distilled the details into an introductory book.1 Following is an outline of the highlights of her discussion of these seven errors.

## Error 1: Fundamental variation

The makeup of any solid, liquid, or gas is fundamentally heterogeneous. Gy called this property “constitution heterogeneity.” As a result, no sample will be truly representative of the whole, and sampling variation will occur.

Deionized (DI) water certainly is more homogeneous than is a solids-containing sludge. However, even the DI water is not perfectly uniform when it comes to a given analyte.

## Error 2: Grouping and segregation variation

Figure 1 – Illustration of Errors 2 and 6. See text for details.

Within a lot of material, particles of a specific nature may actually be grouped together or segregated from other types of particles. Gy called this property “distribution heterogeneity.” Again, no sample will be truly representative of the whole.

Consider the container shown in Figure 1. The darker the color, the more concentrated is the substance that is distributed throughout. Thus, there is a grouping that is graded as one moves from top to bottom.

## Error 3: Shifts and trends

Gy’s actual name for this variation is “long-range nonperiodic heterogeneity fluctuation variation.” Almost inevitably, processes vary over time. Consequently, sampling variation occurs, since samples taken at different times will have at least slightly different compositions.

## Error 4: Cycles

Gy gave this variation a much longer name: “long-range periodic heterogeneity fluctuation variation.” Some processes shift when, for example, the temperature rises and falls during the day, or when ingredients are added on a periodic basis. Such changes will affect the product’s composition, and thus introduce sampling variation.

## Error 5: Delimitation variation

Figure 2 – Depiction of Error 5. See text for details.

To obtain a random sample, every part of the lot of material must have an equal chance of being selected. However, if the boundaries of the sample are not defined properly, sampling variation will occur.

Consider Figure 2, which represents a field that has been contaminated with a rust-colored liquid. However, the substance has been absorbed into the soil and thus is no longer visible from the surface. Depending on how the sampling areas are defined, there may be a high risk of missing the contamination.

## Error 6: Extraction variation

Even if the sample boundaries are defined correctly, it may not be possible, in practice, to extract the sample from the lot.

Refer again to Figure 1. Assume the container is large and that it houses solid material. If the tool used to remove a sample will not reach the bottom of the vessel, then it will be impossible to extract a representative aliquot.

## Error 7: Handling variation

This variation source is probably the one that is most familiar to the analytical chemist who works with samples. Even if steps are taken to minimize the above six variations, incorrect analyses can still occur. If the integrity of the samples is compromised in any way (e.g., they become contaminated, are not stored properly, or are processed in such a way that analytes are lost), then this final type of sampling variation will occur.

Figure 3 – Example of Error 7. See text for details.

An example of this type of variation is shown in Figure 3. The transparent container on the left is submitted to the laboratory for testing of the liquid inside. In order to prepare the material for quantifying the analyte(s) in question, an aliquot is poured into the opaque container on the right. However, if this new receptacle contains significant levels of any of the analytes, then handling variation will result.

How should a chemist use the above information? The first goal is to be aware constantly that these variations can occur. If the situation at hand is critical, it would be wise to consult the literature for more information, or contact a statistician who is well versed in sampling theory. Otherwise, it would be possible for even the most well-constructed analytical method to provide results that were an unreliable estimate of the analyte level(s).

## Reference

1. Smith, P.L. A Primer for Sampling Solids, Liquids, and Gases: Based on the Seven Sampling Errors of Pierre Gy; Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001.

Lynn Vanatta is an Analytical Chemist; e-mail: statistics@americanlaboratory.com.