In the world of statistics, one of our primary goals is to understand a large population by examining a small sample. A classic example is trying to figure out the average height of all adults in a country. We can’t measure everyone, so we take a sample and calculate the sample mean, Xˉ\bar{X}. But how confident can we be that our sample mean is close to the true population mean, μ\mu?

This is where statistical inference comes in, and it leads us directly to the t-distribution.

The Ideal World: When We Know Everything

Let’s start in an ideal world. If we draw a sample X1,,XnX_1, \dots, X_n from a normal population with a known mean μ\mu and a known standard deviation σ\sigma, we can perfectly describe the behavior of our sample mean Xˉ\bar{X}.

The Central Limit Theorem tells us that the quantity known as the Z-statistic follows a standard normal distribution (mean 0, variance 1):

Z=Xˉμσ/nN(0,1)Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0, 1)

This is a powerful tool. It allows us to calculate the probability of our sample mean being a certain distance from the true mean. We could use it to build precise confidence intervals and perform hypothesis tests.

The Real World: The Problem of the Unknown Sigma (σ\sigma)

There’s just one problem: in virtually all real-world scenarios, we don’t know the true population standard deviation σ\sigma. This single missing piece of information prevents us from using the Z-statistic.

A natural and simple idea is to just replace the unknown population value σ\sigma with our best guess for it: the sample standard deviation, SS. Our new statistic looks like this:

New Statistic=XˉμS/n\text{New Statistic} = \frac{\bar{X} - \mu}{S/\sqrt{n}}

But is this new statistic still normally distributed? The answer is no. By substituting a fixed, known constant (σ\sigma) with a random variable (SS), we’ve introduced a new source of uncertainty. Our new distribution should be more spread out than a standard normal distribution to account for the extra randomness that comes from estimating σ\sigma. This is especially true when our sample size nn is small, making our estimate SS less reliable.

This is precisely the problem the t-distribution was invented to solve.

The Solution: Defining the t-Distribution

Let’s look at the formal definition of a t-distribution. A random variable TT is said to follow a t-distribution with dd degrees of freedom if it can be constructed as the ratio of two independent random variables:

  1. A standard normal random variable, ZN(0,1)Z \sim N(0,1).
  2. The square root of a chi-square random variable, Uχd2U \sim \chi^2_d, divided by its degrees of freedom, dd.

T=ZU/dtdT = \frac{Z}{\sqrt{U/d}} \sim t_d

Our goal now is to show that our new statistic, XˉμS/n\frac{\bar{X} - \mu}{S/\sqrt{n}}, has exactly this structure.

Building the t-Statistic from Our Sample

We need to show that our statistic can be split into a standard normal numerator and a chi-square denominator, and that these two parts are independent.

1. The Numerator: A Standard Normal Variable (Z)

This part is straightforward. We can take the expression for our statistic and just divide the top and bottom by σ/n\sigma/\sqrt{n}:

XˉμS/n=(Xˉμ)/(σ/n)(S/n)/(σ/n)\frac{\bar{X} - \mu}{S/\sqrt{n}} = \frac{(\bar{X} - \mu) / (\sigma/\sqrt{n})}{(S/\sqrt{n}) / (\sigma/\sqrt{n})}

The numerator is our old friend, the Z-statistic, which we know is distributed as N(0,1)N(0,1).

Z=Xˉμσ/nZ = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}

2. The Denominator: A Chi-Square Variable (U)

Now let’s look at the denominator of our rewritten statistic:

Denominator=Sσ=S2σ2\text{Denominator} = \frac{S}{\sigma} = \sqrt{\frac{S^2}{\sigma^2}}

A crucial theorem from statistics (Theorem B in Section 8.8.1 of the textbook) states that for a sample from a normal population:

(n1)S2σ2χn12\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}

This is our chi-square variable, UU, with d=n1d=n-1 degrees of freedom. We can rewrite our denominator to isolate this term:

S2σ2=(n1)S2/σ2n1=Ud\sqrt{\frac{S^2}{\sigma^2}} = \sqrt{\frac{(n-1)S^2/\sigma^2}{n-1}} = \sqrt{\frac{U}{d}}

3. The Final Piece: Independence

The definition of the t-distribution requires that ZZ and UU be independent. Here, we rely on another foundational theorem of statistics: for a sample drawn from a normal distribution, the sample mean Xˉ\bar{X} and the sample variance S2S^2 are independent random variables.

Since our numerator ZZ is a function of Xˉ\bar{X} and our denominator’s core component UU is a function of S2S^2, they are also independent.

Putting It All Together

We have successfully shown that our statistic has the required structure:

XˉμS/n=Xˉμσ/n(n1)S2/σ2n1=ZU/d\frac{\bar{X} - \mu}{S/\sqrt{n}} = \frac{\frac{\bar{X} - \mu}{\sigma/\sqrt{n}}}{\sqrt{\frac{(n-1)S^2/\sigma^2}{n-1}}} = \frac{Z}{\sqrt{U/d}}

Since it perfectly matches the definition, we can conclude:

XˉμS/ntn1\frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}

This is why we use the t-distribution. It is the mathematically correct distribution for describing the behavior of the sample mean when the population standard deviation is unknown and must be estimated from the data. It properly accounts for the extra uncertainty introduced by estimating σ\sigma, giving us a reliable tool for constructing confidence intervals and conducting hypothesis tests in real-world situations.

Appendix: Detailed Proofs of Two Key Theorems

In the main text, we concluded that for a sample from a normal distribution, the statistic XˉμS/n\frac{\bar{X}-\mu}{S/\sqrt{n}} follows a t-distribution with n1n-1 degrees of freedom. This conclusion relies on two fundamental and non-obvious theorems. This appendix provides the detailed proofs for these theorems, following the logic presented in Section 6.3 of Mathematical Statistics and Data Analysis.

Proof 1: The Independence of the Sample Mean (Xˉ\bar{X}) and Sample Variance (S2S^2)

Theorem: For an independent and identically distributed (i.i.d.) sample X1,,XnX_1, \dots, X_n from a normal distribution N(μ,σ2)N(\mu, \sigma^2), the sample mean Xˉ\bar{X} and the sample variance S2S^2 are independent random variables.

Proof Strategy:
Following the textbook’s logic (Corollary A, page 197), we can prove this by showing that Xˉ\bar{X} is independent of the vector of deviations, (X1Xˉ,X2Xˉ,,XnXˉ)(X_1-\bar{X}, X_2-\bar{X}, \dots, X_n-\bar{X}). This is sufficient because the sample variance, S2=1n1(XiXˉ)2S^2 = \frac{1}{n-1}\sum(X_i-\bar{X})^2, is purely a function of this deviation vector.

We will use the Moment-Generating Function (MGF) method. If the joint MGF of a set of random variables can be factored into a product of functions, each involving only one of the variables (or vectors), then those variables (or vectors) are independent.

Detailed Steps:

  1. Define the Joint MGF
    We define the joint MGF of Xˉ\bar{X} and the deviation vector as:

    M(s,t1,,tn)=E[exp{sXˉ+i=1nti(XiXˉ)}]M(s, t_1, \dots, t_n) = E\left[\exp\left\{s\bar{X} + \sum_{i=1}^n t_i(X_i - \bar{X})\right\}\right]

  2. Simplify the Exponent
    Our goal is to re-express the exponent as a linear combination of the original sample variables, XiX_i.

    sXˉ+i=1nti(XiXˉ)=sni=1nXi+i=1ntiXiXˉi=1nti=i=1n(sn+titˉ)Xiwhere tˉ=1nti\begin{aligned} s\bar{X} + \sum_{i=1}^n t_i(X_i - \bar{X}) &= \frac{s}{n}\sum_{i=1}^n X_i + \sum_{i=1}^n t_i X_i - \bar{X}\sum_{i=1}^n t_i \\ &= \sum_{i=1}^n \left(\frac{s}{n} + t_i - \bar{t}\right)X_i \quad \text{where } \bar{t}=\frac{1}{n}\sum t_i \end{aligned}

    Let’s define the coefficient ai=sn+titˉa_i = \frac{s}{n} + t_i - \bar{t}. The exponent is now simply i=1naiXi\sum_{i=1}^n a_i X_i.

  3. Use Properties of MGFs
    Since the XiX_i are i.i.d., the MGF of their linear combination is the product of their individual MGFs:

    M(s,t)=E[exp{i=1naiXi}]=i=1nMXi(ai)M(s, \mathbf{t}) = E\left[\exp\left\{\sum_{i=1}^n a_i X_i\right\}\right] = \prod_{i=1}^n M_{X_i}(a_i)

    For each XiN(μ,σ2)X_i \sim N(\mu, \sigma^2), its MGF is MXi(u)=exp(μu+12σ2u2)M_{X_i}(u) = \exp(\mu u + \frac{1}{2}\sigma^2 u^2). Therefore:

    M(s,t)=i=1nexp(μai+12σ2ai2)=exp(μi=1nai+12σ2i=1nai2)M(s, \mathbf{t}) = \prod_{i=1}^n \exp\left(\mu a_i + \frac{1}{2}\sigma^2 a_i^2\right) = \exp\left(\mu\sum_{i=1}^n a_i + \frac{1}{2}\sigma^2\sum_{i=1}^n a_i^2\right)

  4. Calculate the Sums of the Coefficients
    Following the algebraic steps on page 196 of the textbook:

    • ai=(sn+titˉ)=n(sn)+tintˉ=s+ntˉntˉ=s\sum a_i = \sum(\frac{s}{n} + t_i - \bar{t}) = n(\frac{s}{n}) + \sum t_i - n\bar{t} = s + n\bar{t} - n\bar{t} = s
    • ai2=(sn+(titˉ))2=(s2n2+2sn(titˉ)+(titˉ)2)\sum a_i^2 = \sum(\frac{s}{n} + (t_i - \bar{t}))^2 = \sum\left(\frac{s^2}{n^2} + \frac{2s}{n}(t_i-\bar{t}) + (t_i-\bar{t})^2\right). Since (titˉ)=0\sum(t_i-\bar{t}) = 0, this simplifies to ai2=s2n+(titˉ)2\sum a_i^2 = \frac{s^2}{n} + \sum(t_i-\bar{t})^2.
  5. Factor the MGF
    Substituting these sums back into the exponent of the MGF:

    M(s,t)=exp(μs+12σ2[s2n+(titˉ)2])=exp(μs+σ2s22n)exp(σ22i=1n(titˉ)2)\begin{aligned} M(s, \mathbf{t}) &= \exp\left(\mu s + \frac{1}{2}\sigma^2\left[\frac{s^2}{n} + \sum(t_i-\bar{t})^2\right]\right) \\ &= \exp\left(\mu s + \frac{\sigma^2s^2}{2n}\right) \cdot \exp\left(\frac{\sigma^2}{2}\sum_{i=1}^n(t_i-\bar{t})^2\right) \end{aligned}

  6. Conclusion
    The joint MGF has successfully factored into two separate parts:

    • The first part, exp(μs+σ2s22n)\exp(\mu s + \frac{\sigma^2s^2}{2n}), depends only on ss and is the MGF of Xˉ\bar{X}.
    • The second part, exp(σ22i=1n(titˉ)2)\exp(\frac{\sigma^2}{2}\sum_{i=1}^n(t_i-\bar{t})^2), depends only on the tit_i variables and is the MGF of the deviation vector.

    So, in the end, we have

    MXˉ,X1Xˉ,,XnXˉ(s,t)=MXˉ(s)MX1Xˉ,,XnXˉ(t)M_{\bar{X},X_1-\bar{X},\cdots,X_n-\bar{X}}(s,\mathbf{t}) = M_{\bar{X}}(s) \cdot M_{X_1-\bar{X},\cdots,X_n-\bar{X}}(\mathbf{t})

    Because the joint MGF factors, Xˉ\bar{X} is independent of the deviation vector (X1Xˉ,,XnXˉ)(X_1-\bar{X}, \dots, X_n-\bar{X}). Consequently, Xˉ\bar{X} is also independent of S2S^2, which is a function of that deviation vector. ✅


Proof 2: The Distribution of (n1)S2σ2\frac{(n-1)S^2}{\sigma^2}

Theorem: For an i.i.d. sample from a normal distribution N(μ,σ2)N(\mu, \sigma^2), the statistic (n1)S2σ2\frac{(n-1)S^2}{\sigma^2} follows a chi-square distribution with n1n-1 degrees of freedom (χn12\chi^2_{n-1}).

Proof Strategy:
The proof, as detailed on page 197 of the textbook, uses a clever algebraic decomposition. We start with a known χn2\chi^2_n random variable and split it into two components. By using the independence result from Proof 1 and properties of MGFs, we can deduce the distribution of the component we are interested in.

Detailed Steps:

  1. Starting Point: A Known Chi-Square Distribution
    We know that if XiN(μ,σ2)X_i \sim N(\mu, \sigma^2), then XiμσN(0,1)\frac{X_i - \mu}{\sigma} \sim N(0, 1). The sum of squares of nn independent standard normal variables follows a chi-square distribution with nn degrees of freedom. Let’s call this variable WW:

    W=i=1n(Xiμσ)2χn2W = \sum_{i=1}^n \left(\frac{X_i - \mu}{\sigma}\right)^2 \sim \chi^2_n

    The MGF of this variable is MW(t)=(12t)n/2M_W(t) = (1-2t)^{-n/2}.

  2. The Key Algebraic Decomposition
    We introduce the sample mean Xˉ\bar{X} into the sum of squares term:

    i=1n(Xiμ)2=i=1n[(XiXˉ)+(Xˉμ)]2=i=1n(XiXˉ)2+i=1n(Xˉμ)2+2i=1n(XiXˉ)(Xˉμ)\begin{aligned} \sum_{i=1}^n (X_i - \mu)^2 &= \sum_{i=1}^n [(X_i - \bar{X}) + (\bar{X} - \mu)]^2 \\ &= \sum_{i=1}^n (X_i - \bar{X})^2 + \sum_{i=1}^n (\bar{X} - \mu)^2 + 2\sum_{i=1}^n (X_i - \bar{X})(\bar{X} - \mu) \end{aligned}

    Since the sum of deviations from the sample mean is zero ((XiXˉ)=0\sum(X_i - \bar{X}) = 0), the cross-product term vanishes. This gives us the crucial identity:

    i=1n(Xiμ)2=i=1n(XiXˉ)2+n(Xˉμ)2\sum_{i=1}^n (X_i - \mu)^2 = \sum_{i=1}^n (X_i - \bar{X})^2 + n(\bar{X} - \mu)^2

  3. Standardize and Re-interpret the Terms
    Divide the entire identity by σ2\sigma^2:

    (Xiμ)2σ2=(XiXˉ)2σ2+n(Xˉμ)2σ2\frac{\sum(X_i - \mu)^2}{\sigma^2} = \frac{\sum(X_i - \bar{X})^2}{\sigma^2} + \frac{n(\bar{X} - \mu)^2}{\sigma^2}

    Let’s identify these three parts:

    • Left side: This is our starting variable, Wχn2W \sim \chi^2_n.
    • Right side, term 1: Using the definition S2=1n1(XiXˉ)2S^2 = \frac{1}{n-1}\sum(X_i-\bar{X})^2, this term is exactly (n1)S2σ2\frac{(n-1)S^2}{\sigma^2}. Let’s call this UU. This is the quantity whose distribution we want to find.
    • Right side, term 2: This term can be rewritten as (Xˉμσ/n)2\left(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right)^2. The term inside the parenthesis is a standard normal variable ZZ, so this is Z2Z^2. By definition, the square of a standard normal variable follows a χ12\chi^2_1 distribution. Let’s call this VV.

    Our identity now becomes the relationship:

    W=U+VW = U + V

  4. Use Independence and MGFs

    • From Proof 1, we established that Xˉ\bar{X} and S2S^2 are independent.
    • Therefore, VV (a function of Xˉ\bar{X}) and UU (a function of S2S^2) are also independent.
    • For independent random variables, the MGF of their sum is the product of their MGFs: MW(t)=MU(t)MV(t)M_W(t) = M_U(t) \cdot M_V(t).
  5. Solve for MU(t)M_U(t)
    We can now algebraically solve for the MGF of UU:

    MU(t)=MW(t)MV(t)M_U(t) = \frac{M_W(t)}{M_V(t)}

    We know the distributions of WW and VV, so we know their MGFs:

    • MW(t)=(12t)n/2M_W(t) = (1-2t)^{-n/2} (since Wχn2W \sim \chi^2_n)
    • MV(t)=(12t)1/2M_V(t) = (1-2t)^{-1/2} (since Vχ12V \sim \chi^2_1)

    Substituting these in gives:

    MU(t)=(12t)n/2(12t)1/2=(12t)(n/21/2)=(12t)(n1)/2M_U(t) = \frac{(1-2t)^{-n/2}}{(1-2t)^{-1/2}} = (1-2t)^{-(n/2 - 1/2)} = (1-2t)^{-(n-1)/2}

  6. Conclusion
    The MGF we found, MU(t)=(12t)(n1)/2M_U(t) = (1-2t)^{-(n-1)/2}, is the unique MGF of a chi-square distribution with n1n-1 degrees of freedom. Therefore, we conclude that:

    U=(n1)S2σ2χn12U = \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}