Statistics in the Laboratory: Confidence Interval of the Mean

Estimated means are often used for making quantitative decisions (e.g., to decide if something meets specifications). The more certain the mean, the more certain the decision. In this column we’ll explore a way to describe and visualize the uncertainty in estimated means.

As we saw in the last column, the uncertainty in a mean is less than the uncertainty in the raw data by a factor of 1/√n, both for an infinite population of data (on Saturn):

Image

and for a sample of that data (on Earth):

Image

where σ (a population parameter) is the true standard deviation of the raw data; s (a sample statistic) is an estimate of σ; σx- is the true standard deviation of the mean; sx- is an estimate of σx-; and n is the number of measurements that go into calculating the mean x-.

A confidence interval places boundaries around an estimated x- so that the true mean μ would be expected to lie within those boundaries a certain percentage of the times the method is applied. If the uncertainty is large, then the interval between the boundaries must be wide; if the uncertainty is small, then the interval can be narrow.

The method for constructing confidence intervals uses a set of equations to calculate the boundaries. Using the abbreviations LCB for “lower confidence boundary” and UCB for “upper confidence boundary,” the equations (on Earth) are:

Image

where t is a tabular value of Student’s t. For a two-sided interval, these two boundaries are often “folded together” to give the expression:

Image

which is read, “The true value for μ is contained within a confidence interval bounded by x- – tsx- and x- + tsx- .” Statisticians usually write this two-sided interval more explicitly as:

Image

Curiously, statisticians seldom write these equations in their fundamentally more meaningful form that uses sx-. Instead, they employ the more obscure form that uses s/√n. From now on, whenever you see s/√n in an equation, you’ll recognize that it’s simply the estimated standard deviation of the mean sx-.

The value of Student’s t is obtained from tables. It’s a bit tricky to pull out the correct value—it depends on the “sidedness” of the interval (one-sided or two-sided), it depends on the level of confidence, and it depends on the degrees of freedom (df, usually equal to n – 1). Once you catch on to it, it becomes pretty easy to find the correct value.

It’s fundamentally important to understand that the confidence is in the method that calculates the interval, not in the interval itself. This subtlety is often ignored when the intervals are used in practice, and we say, for example, “The probability is 95% that μ lies within this interval,” when we should say, “There is 95% probability that the method has produced an interval that contains μ.” (See Moore and McCabe1 for a more complete discussion of this subtlety.)

So how does this work in practice? Suppose (as in the last column) μ = 4.76 units, and σ = 0.30 units. For a two-sided interval with n = 3 (df = n – 1 = 2) and 95% confidence, t = 4.3027. Figure 1 shows the 95% confidence interval of the mean for 100 sets of three data points drawn at random from that infinite population of data.

ImageFigure 1 – One hundred two-sided 95% confidence intervals based on random draws of n = 3 measurements from an infinite population of data for which μ = 4.76 (vertical line) and σ = 0.30. Student’s t = 4.3027 for two degrees of freedom and 95% confidence. Five intervals (shown in red) do not include μ: result numbers 40 (narrow and hard to see), 43, 63, 67, and 100.

Note that five of the 100 results have confidence intervals (shown in red) that do not include the true mean μ, results 40 (narrow and hard to see), 43, 63, 67, and 100. If the method has 95% confidence of producing an interval that will contain μ, then there is a fractional risk α = (100% – 95%)/100% = 0.05 that it will not produce an interval that contains μ, and α x 100 results = 0.05 x 100 results = 5 results.

In practice, you won’t be looking at 100 confidence intervals as shown in Figure 1. You will have calculated only one interval. You won’t know if your s is smaller or larger than σ, you won’t know if your sx- is smaller or larger than σx-, and thus you won’t know if your confidence interval is smaller or larger than it “should” be—all you know is that the interval the method has given you has a 95% chance of including the true mean μ.

Suppose you didn’t need 95% confidence, only 90% confidence. What would those intervals look like? Using the same data as was used in Figure 1, Figure 2 shows 90% confidence intervals. Student’s t needs to be only 2.9200 for 90% confidence, so these intervals are narrower than they were for 95% confidence (see Figure 1). Note that 10 of these intervals (shown in red) don’t include the true mean μ, as expected.

ImageFigure 2 – Same data as Figure 1. One hundred two-sided 90% confidence intervals based on random draws of n = 3 measurements from an infinite population of data for which μ = 4.76 (vertical line) and σ = 0.30. Student’s t = 2.9200 for two degrees of freedom and 90% confidence. Ten intervals (shown in red) do not include μ: result numbers 17, 33, 40 (narrow and hard to see), 43, 62, 63, 67, 76, 88, and 100.

Suppose you’re more conservative and want 99% confidence. Figure 3 uses the same data as Figure 1 but shows 99% confidence intervals (t = 9.9248). As expected, only one of these intervals doesn’t include the true mean μ.

In the last column we discussed the advantages of using a more precise measurement method. Figure 4 shows 95% confidence intervals for the simulation shown in Figure 1 but this time using σ = 0.15 rather than 0.30. Clearly, the confidence intervals are narrower, as expected.

ImageFigure 3 – Same data as Figure 1. One hundred two-sided 99% confidence intervals based on random draws of n = 3 measurements from an infinite population of data for which μ = 4.76 (vertical line) and σ = 0.30. Student’s t = 9.9248 for two degrees of freedom and 99% confidence. One interval (shown in red) does not include μ: result number 40 (narrow and hard to see).
ImageFigure 4 – One hundred two-sided 95% confidence intervals based on random draws of n = 3 measurements from an infinite population of data for which μ = 4.76 (vertical line) and σ = 0.15. Compare with Figure 1 for which σ = 0.30. Student’s t = 4.3027 for two degrees of freedom and 95% confidence. Five intervals (shown in red) do not include μ: result numbers 40 (narrow and hard to see), 43, 63, 67, and 100.

Two-sided confidence intervals are used when you’re trying to answer a question that contains the words “between,” “outside,” or “different from” (or their equivalents). If you’re trying to answer a question that contains “less than” or “greater than” (but not both at the same time, or you’re back to “different from”), then a one-sided confidence interval is appropriate.

ImageFigure 5 – Same data as Figure 1. One hundred one-sided 95% confidence intervals truncated at the LCB based on random draws of n = 3 measurements from an infinite population of data for which μ = 4.76 (vertical line) and σ = 0.30. Student’s t = 2.9200 for two degrees of freedom and 95% confidence. Five intervals (shown in red) do not include μ: result numbers 17, 33, 43, 67, and 76.

Figure 5 shows one-sided confidence intervals that are appropriate for “greater than” questions. These intervals are truncated on the low side but go off to infinity on the right side. The equation that defines LCB shown earlier in this column can be used (with a one-sided t value of 2.9200 for 95% confidence). Statisticians usually write a one-sided interval as:

Image.

ImageFigure 6 – Same data as Figure 1. One hundred one-sided 95% confidence intervals truncated at the UCB based on random draws of n = 3 measurements from an infinite population of data for which μ = 4.76 (vertical line) and σ = 0.30. Student’s t = 2.9200 for two degrees of freedom and 95% confidence. Five intervals (shown in red) do not include μ: result numbers 40, 62, 63, 88, and 100.

Figure 6 is fairly straightforward. These intervals would be used to show that something is “less than” some specified value. We could use the earlier equation to define UCB; statisticians write the interval as:

Image

Finally, it’s a bit unnerving to see how the widths of the confidence intervals vary from random draw to random draw (see Figure 1, for example), but that’s the way it is with small statistical samples—the estimate of the standard deviation is quite variable when the number of degrees of freedom is small (we’ll discuss this in a future column). We’ve seen that the pooled standard deviation σm is a better measure of the standard deviation of the measurement process, and we can substitute this in our equations. Because σm is based on a very large number of degrees of freedom (probably big enough that we can consider df = ∞), the t statistic is replaced with the z statistic, and we can write, for example:

Image

In this form, even though z and σm are based on an approximately infinite number of degrees of freedom, n is still the number of measurements that have gone into calculating x-. Assuming n is always constant (as it might be, for example, for routine product release measurements), this gives rise to a constant width for the confidence interval. Assuming μ = 4.76, σm= 0.30, n = 3, and 95% confidence (z = 1.9600), Figure 7 shows confidence intervals for 100 sets of three data points drawn at random from the infinite population. The widths of the confidence intervals are all identical, and they are positioned to the left or to the right depending on the estimate of x-.

ImageFigure 7 – Same data as Figure 1. One hundred two-sided 95% confidence intervals based on random draws of n = 3 measurements from an infinite population of data for which μ = 4.76 (vertical line) and σ = 0.30. Confidence intervals based on σm = 0.30 with z = 1.9600 for an infinite number of degrees of freedom and 95% confidence. Five intervals (shown in red) do not include μ: result numbers 43, 51, 67, 72, and 80.

Full disclosure: Figures 1–7 are based on random draws from infinite populations, but because the figures are used for teaching purposes, they are made to show the exact expectations of events for the various levels of confidence (e.g., five of the 100 intervals in Figure 1 don’t include the true mean). In real life, simulations don’t work out this nicely (e.g., only four of the 100 intervals in Figure 1 might not include the true mean). I had to try 445 seeds for the random number generator I used before I found a value that would give the expectation values for all seven figures.

While on the subject of teaching, can you figure out why the t values are the same (2.9200) for Figures 2, 5, and 6? And why is the set of intervals that don’t include the true mean in Figure 2 the composite of the sets of intervals that don’t include the true mean in Figures 5 and 6?

In the next column, we’ll see how these confidence intervals can be used to make statistical decisions with known levels of risk of being wrong. Confidence intervals offer an alternative to the usual algebraic statistical tests, an alternative that is much safer to use in practice and offers a visual interpretation of what is going on in these tests.

Reference

  1. Moore, D.S. and McCabe, G.P. Introduction to the Practice of Statistics, 2nd ed.; Freeman: New York, NY, 1993; p. 440.

Stanley N. Deming, Ph.D., is an analytical chemist masquerading as a statistician at Statistical Designs, El Paso, Texas, U.S.A.; e-mail: [email protected]; www.statisticaldesigns.com