Statistics---Why We Use the t-Distribution to Estimate the Population Mean

In the world of statistics, one of our primary goals is to understand a large population by examining a small sample. A classic example is trying to figure out the average height of all adults in a country. We can’t measure everyone, so we take a sample and calculate the sample mean, $\bar{X}$ . But how confident can we be that our sample mean is close to the true population mean, $\mu$ ?

This is where statistical inference comes in, and it leads us directly to the t-distribution.

The Ideal World: When We Know Everything

Let’s start in an ideal world. If we draw a sample $X_1, \dots, X_n$ from a normal population with a known mean $\mu$ and a known standard deviation $\sigma$ , we can perfectly describe the behavior of our sample mean $\bar{X}$ .

The Central Limit Theorem tells us that the quantity known as the Z-statistic follows a standard normal distribution (mean 0, variance 1):

$Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0, 1)$

This is a powerful tool. It allows us to calculate the probability of our sample mean being a certain distance from the true mean. We could use it to build precise confidence intervals and perform hypothesis tests.

The Real World: The Problem of the Unknown Sigma ( $\sigma$ )

There’s just one problem: in virtually all real-world scenarios, we don’t know the true population standard deviation $\sigma$ . This single missing piece of information prevents us from using the Z-statistic.

A natural and simple idea is to just replace the unknown population value $\sigma$ with our best guess for it: the sample standard deviation, $S$ . Our new statistic looks like this:

$\text{New Statistic} = \frac{\bar{X} - \mu}{S/\sqrt{n}}$

But is this new statistic still normally distributed? The answer is no. By substituting a fixed, known constant ( $\sigma$ ) with a random variable ( $S$ ), we’ve introduced a new source of uncertainty. Our new distribution should be more spread out than a standard normal distribution to account for the extra randomness that comes from estimating $\sigma$ . This is especially true when our sample size $n$ is small, making our estimate $S$ less reliable.

This is precisely the problem the t-distribution was invented to solve.

The Solution: Defining the t-Distribution

Let’s look at the formal definition of a t-distribution. A random variable $T$ is said to follow a t-distribution with $d$ degrees of freedom if it can be constructed as the ratio of two independent random variables:

A standard normal random variable, $Z \sim N(0,1)$ .
The square root of a chi-square random variable, $U \sim \chi^2_d$ , divided by its degrees of freedom, $d$ .

$T = \frac{Z}{\sqrt{U/d}} \sim t_d$

Our goal now is to show that our new statistic, $\frac{\bar{X} - \mu}{S/\sqrt{n}}$ , has exactly this structure.

Building the t-Statistic from Our Sample

We need to show that our statistic can be split into a standard normal numerator and a chi-square denominator, and that these two parts are independent.

1. The Numerator: A Standard Normal Variable (Z)

This part is straightforward. We can take the expression for our statistic and just divide the top and bottom by $\sigma/\sqrt{n}$ :

$\frac{\bar{X} - \mu}{S/\sqrt{n}} = \frac{(\bar{X} - \mu) / (\sigma/\sqrt{n})}{(S/\sqrt{n}) / (\sigma/\sqrt{n})}$

The numerator is our old friend, the Z-statistic, which we know is distributed as $N(0,1)$ .

$Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}$

2. The Denominator: A Chi-Square Variable (U)

Now let’s look at the denominator of our rewritten statistic:

$\text{Denominator} = \frac{S}{\sigma} = \sqrt{\frac{S^2}{\sigma^2}}$

A crucial theorem from statistics (Theorem B in Section 8.8.1 of the textbook) states that for a sample from a normal population:

$\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}$

This is our chi-square variable, $U$ , with $d=n-1$ degrees of freedom. We can rewrite our denominator to isolate this term:

$\sqrt{\frac{S^2}{\sigma^2}} = \sqrt{\frac{(n-1)S^2/\sigma^2}{n-1}} = \sqrt{\frac{U}{d}}$

3. The Final Piece: Independence

The definition of the t-distribution requires that $Z$ and $U$ be independent. Here, we rely on another foundational theorem of statistics: for a sample drawn from a normal distribution, the sample mean $\bar{X}$ and the sample variance $S^2$ are independent random variables.

Since our numerator $Z$ is a function of $\bar{X}$ and our denominator’s core component $U$ is a function of $S^2$ , they are also independent.

Putting It All Together

We have successfully shown that our statistic has the required structure:

$\frac{\bar{X} - \mu}{S/\sqrt{n}} = \frac{\frac{\bar{X} - \mu}{\sigma/\sqrt{n}}}{\sqrt{\frac{(n-1)S^2/\sigma^2}{n-1}}} = \frac{Z}{\sqrt{U/d}}$

Since it perfectly matches the definition, we can conclude:

$\frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}$

This is why we use the t-distribution. It is the mathematically correct distribution for describing the behavior of the sample mean when the population standard deviation is unknown and must be estimated from the data. It properly accounts for the extra uncertainty introduced by estimating $\sigma$ , giving us a reliable tool for constructing confidence intervals and conducting hypothesis tests in real-world situations.

Appendix: Detailed Proofs of Two Key Theorems

In the main text, we concluded that for a sample from a normal distribution, the statistic $\frac{\bar{X}-\mu}{S/\sqrt{n}}$ follows a t-distribution with $n-1$ degrees of freedom. This conclusion relies on two fundamental and non-obvious theorems. This appendix provides the detailed proofs for these theorems, following the logic presented in Section 6.3 of Mathematical Statistics and Data Analysis.

Proof 1: The Independence of the Sample Mean ( $\bar{X}$ ) and Sample Variance ( $S^2$ )

Theorem: For an independent and identically distributed (i.i.d.) sample $X_1, \dots, X_n$ from a normal distribution $N(\mu, \sigma^2)$ , the sample mean $\bar{X}$ and the sample variance $S^2$ are independent random variables.

Proof Strategy:
Following the textbook’s logic (Corollary A, page 197), we can prove this by showing that $\bar{X}$ is independent of the vector of deviations, $(X_1-\bar{X}, X_2-\bar{X}, \dots, X_n-\bar{X})$ . This is sufficient because the sample variance, $S^2 = \frac{1}{n-1}\sum(X_i-\bar{X})^2$ , is purely a function of this deviation vector.

We will use the Moment-Generating Function (MGF) method. If the joint MGF of a set of random variables can be factored into a product of functions, each involving only one of the variables (or vectors), then those variables (or vectors) are independent.

Detailed Steps:

Define the Joint MGF
We define the joint MGF of $\bar{X}$ and the deviation vector as:

$M(s, t_1, \dots, t_n) = E\left[\exp\left\{s\bar{X} + \sum_{i=1}^n t_i(X_i - \bar{X})\right\}\right]$
Simplify the Exponent
Our goal is to re-express the exponent as a linear combination of the original sample variables, $X_i$ .

$\begin{aligned} s\bar{X} + \sum_{i=1}^n t_i(X_i - \bar{X}) &= \frac{s}{n}\sum_{i=1}^n X_i + \sum_{i=1}^n t_i X_i - \bar{X}\sum_{i=1}^n t_i \\ &= \sum_{i=1}^n \left(\frac{s}{n} + t_i - \bar{t}\right)X_i \quad \text{where } \bar{t}=\frac{1}{n}\sum t_i \end{aligned}$

Let’s define the coefficient $a_i = \frac{s}{n} + t_i - \bar{t}$ . The exponent is now simply $\sum_{i=1}^n a_i X_i$ .
Use Properties of MGFs
Since the $X_i$ are i.i.d., the MGF of their linear combination is the product of their individual MGFs:

$M(s, \mathbf{t}) = E\left[\exp\left\{\sum_{i=1}^n a_i X_i\right\}\right] = \prod_{i=1}^n M_{X_i}(a_i)$

For each $X_i \sim N(\mu, \sigma^2)$ , its MGF is $M_{X_i}(u) = \exp(\mu u + \frac{1}{2}\sigma^2 u^2)$ . Therefore:

$M(s, \mathbf{t}) = \prod_{i=1}^n \exp\left(\mu a_i + \frac{1}{2}\sigma^2 a_i^2\right) = \exp\left(\mu\sum_{i=1}^n a_i + \frac{1}{2}\sigma^2\sum_{i=1}^n a_i^2\right)$
Calculate the Sums of the Coefficients
Following the algebraic steps on page 196 of the textbook:
- $\sum a_i = \sum(\frac{s}{n} + t_i - \bar{t}) = n(\frac{s}{n}) + \sum t_i - n\bar{t} = s + n\bar{t} - n\bar{t} = s$
- $\sum a_i^2 = \sum(\frac{s}{n} + (t_i - \bar{t}))^2 = \sum\left(\frac{s^2}{n^2} + \frac{2s}{n}(t_i-\bar{t}) + (t_i-\bar{t})^2\right)$ . Since $\sum(t_i-\bar{t}) = 0$ , this simplifies to $\sum a_i^2 = \frac{s^2}{n} + \sum(t_i-\bar{t})^2$ .
Factor the MGF
Substituting these sums back into the exponent of the MGF:

$\begin{aligned} M(s, \mathbf{t}) &= \exp\left(\mu s + \frac{1}{2}\sigma^2\left[\frac{s^2}{n} + \sum(t_i-\bar{t})^2\right]\right) \\ &= \exp\left(\mu s + \frac{\sigma^2s^2}{2n}\right) \cdot \exp\left(\frac{\sigma^2}{2}\sum_{i=1}^n(t_i-\bar{t})^2\right) \end{aligned}$
Conclusion
The joint MGF has successfully factored into two separate parts:
- The first part, $\exp(\mu s + \frac{\sigma^2s^2}{2n})$ , depends only on $s$ and is the MGF of $\bar{X}$ .
- The second part, $\exp(\frac{\sigma^2}{2}\sum_{i=1}^n(t_i-\bar{t})^2)$ , depends only on the $t_i$ variables and is the MGF of the deviation vector.
So, in the end, we have

$M_{\bar{X},X_1-\bar{X},\cdots,X_n-\bar{X}}(s,\mathbf{t}) = M_{\bar{X}}(s) \cdot M_{X_1-\bar{X},\cdots,X_n-\bar{X}}(\mathbf{t})$

Because the joint MGF factors, $\bar{X}$ is independent of the deviation vector $(X_1-\bar{X}, \dots, X_n-\bar{X})$ . Consequently, $\bar{X}$ is also independent of $S^2$ , which is a function of that deviation vector. ✅

Proof 2: The Distribution of $\frac{(n-1)S^2}{\sigma^2}$

Theorem: For an i.i.d. sample from a normal distribution $N(\mu, \sigma^2)$ , the statistic $\frac{(n-1)S^2}{\sigma^2}$ follows a chi-square distribution with $n-1$ degrees of freedom ( $\chi^2_{n-1}$ ).

Proof Strategy:
The proof, as detailed on page 197 of the textbook, uses a clever algebraic decomposition. We start with a known $\chi^2_n$ random variable and split it into two components. By using the independence result from Proof 1 and properties of MGFs, we can deduce the distribution of the component we are interested in.