Statistics---Why We Use the t-Distribution to Estimate the Population Mean
In the world of statistics, one of our primary goals is to understand a large population by examining a small sample. A classic example is trying to figure out the average height of all adults in a country. We can’t measure everyone, so we take a sample and calculate the sample mean, . But how confident can we be that our sample mean is close to the true population mean, ?
This is where statistical inference comes in, and it leads us directly to the t-distribution.
The Ideal World: When We Know Everything
Let’s start in an ideal world. If we draw a sample from a normal population with a known mean and a known standard deviation , we can perfectly describe the behavior of our sample mean .
The Central Limit Theorem tells us that the quantity known as the Z-statistic follows a standard normal distribution (mean 0, variance 1):
This is a powerful tool. It allows us to calculate the probability of our sample mean being a certain distance from the true mean. We could use it to build precise confidence intervals and perform hypothesis tests.
The Real World: The Problem of the Unknown Sigma ()
There’s just one problem: in virtually all real-world scenarios, we don’t know the true population standard deviation . This single missing piece of information prevents us from using the Z-statistic.
A natural and simple idea is to just replace the unknown population value with our best guess for it: the sample standard deviation, . Our new statistic looks like this:
But is this new statistic still normally distributed? The answer is no. By substituting a fixed, known constant () with a random variable (), we’ve introduced a new source of uncertainty. Our new distribution should be more spread out than a standard normal distribution to account for the extra randomness that comes from estimating . This is especially true when our sample size is small, making our estimate less reliable.
This is precisely the problem the t-distribution was invented to solve.
The Solution: Defining the t-Distribution
Let’s look at the formal definition of a t-distribution. A random variable is said to follow a t-distribution with degrees of freedom if it can be constructed as the ratio of two independent random variables:
- A standard normal random variable, .
 - The square root of a chi-square random variable, , divided by its degrees of freedom, .
 
Our goal now is to show that our new statistic, , has exactly this structure.
Building the t-Statistic from Our Sample
We need to show that our statistic can be split into a standard normal numerator and a chi-square denominator, and that these two parts are independent.
1. The Numerator: A Standard Normal Variable (Z)
This part is straightforward. We can take the expression for our statistic and just divide the top and bottom by :
The numerator is our old friend, the Z-statistic, which we know is distributed as .
2. The Denominator: A Chi-Square Variable (U)
Now let’s look at the denominator of our rewritten statistic:
A crucial theorem from statistics (Theorem B in Section 8.8.1 of the textbook) states that for a sample from a normal population:
This is our chi-square variable, , with degrees of freedom. We can rewrite our denominator to isolate this term:
3. The Final Piece: Independence
The definition of the t-distribution requires that and be independent. Here, we rely on another foundational theorem of statistics: for a sample drawn from a normal distribution, the sample mean and the sample variance are independent random variables.
Since our numerator is a function of and our denominator’s core component is a function of , they are also independent.
Putting It All Together
We have successfully shown that our statistic has the required structure:
Since it perfectly matches the definition, we can conclude:
This is why we use the t-distribution. It is the mathematically correct distribution for describing the behavior of the sample mean when the population standard deviation is unknown and must be estimated from the data. It properly accounts for the extra uncertainty introduced by estimating , giving us a reliable tool for constructing confidence intervals and conducting hypothesis tests in real-world situations.
Appendix: Detailed Proofs of Two Key Theorems
In the main text, we concluded that for a sample from a normal distribution, the statistic follows a t-distribution with degrees of freedom. This conclusion relies on two fundamental and non-obvious theorems. This appendix provides the detailed proofs for these theorems, following the logic presented in Section 6.3 of Mathematical Statistics and Data Analysis.
Proof 1: The Independence of the Sample Mean () and Sample Variance ()
Theorem: For an independent and identically distributed (i.i.d.) sample from a normal distribution , the sample mean and the sample variance are independent random variables.
Proof Strategy:
Following the textbook’s logic (Corollary A, page 197), we can prove this by showing that  is independent of the vector of deviations, . This is sufficient because the sample variance, , is purely a function of this deviation vector.
We will use the Moment-Generating Function (MGF) method. If the joint MGF of a set of random variables can be factored into a product of functions, each involving only one of the variables (or vectors), then those variables (or vectors) are independent.
Detailed Steps:
- 
Define the Joint MGF
We define the joint MGF of and the deviation vector as: - 
Simplify the Exponent
Our goal is to re-express the exponent as a linear combination of the original sample variables, .Let’s define the coefficient . The exponent is now simply .
 - 
Use Properties of MGFs
Since the are i.i.d., the MGF of their linear combination is the product of their individual MGFs:For each , its MGF is . Therefore:
 - 
Calculate the Sums of the Coefficients
Following the algebraic steps on page 196 of the textbook:- . Since , this simplifies to .
 
 - 
Factor the MGF
Substituting these sums back into the exponent of the MGF: - 
Conclusion
The joint MGF has successfully factored into two separate parts:- The first part, , depends only on and is the MGF of .
 - The second part, , depends only on the variables and is the MGF of the deviation vector.
 
So, in the end, we have
Because the joint MGF factors, is independent of the deviation vector . Consequently, is also independent of , which is a function of that deviation vector. ✅
 
Proof 2: The Distribution of
Theorem: For an i.i.d. sample from a normal distribution , the statistic follows a chi-square distribution with degrees of freedom ().
Proof Strategy:
The proof, as detailed on page 197 of the textbook, uses a clever algebraic decomposition. We start with a known  random variable and split it into two components. By using the independence result from Proof 1 and properties of MGFs, we can deduce the distribution of the component we are interested in.
Detailed Steps:
- 
Starting Point: A Known Chi-Square Distribution
We know that if , then . The sum of squares of independent standard normal variables follows a chi-square distribution with degrees of freedom. Let’s call this variable :The MGF of this variable is .
 - 
The Key Algebraic Decomposition
We introduce the sample mean into the sum of squares term:Since the sum of deviations from the sample mean is zero (), the cross-product term vanishes. This gives us the crucial identity:
 - 
Standardize and Re-interpret the Terms
Divide the entire identity by :Let’s identify these three parts:
- Left side: This is our starting variable, .
 - Right side, term 1: Using the definition , this term is exactly . Let’s call this . This is the quantity whose distribution we want to find.
 - Right side, term 2: This term can be rewritten as . The term inside the parenthesis is a standard normal variable , so this is . By definition, the square of a standard normal variable follows a distribution. Let’s call this .
 
Our identity now becomes the relationship:
 - 
Use Independence and MGFs
- From Proof 1, we established that and are independent.
 - Therefore, (a function of ) and (a function of ) are also independent.
 - For independent random variables, the MGF of their sum is the product of their MGFs: .
 
 - 
Solve for
We can now algebraically solve for the MGF of :We know the distributions of and , so we know their MGFs:
- (since )
 - (since )
 
Substituting these in gives:
 - 
Conclusion
The MGF we found, , is the unique MGF of a chi-square distribution with degrees of freedom. Therefore, we conclude that:✅