Hypothesis Testing

Model Inference

Learning Outcomes

  • Hypothesis Tests

  • Model Inference

Hypothesis Tests

Hypothesis Tests

Hypothesis tests are used to test whether claims are valid or not. This is conducted by collecting data, setting the Null and Alternative Hypothesis.

Null Hypothesis \(H_0\)

The null hypothesis is the claim that is initially believed to be true. For the most part, it is always equal to the hypothesized value.

Alternative Hypothesis \(H_a\)

The alternative hypothesis contradicts the null hypothesis.

Example of Null and Alternative Hypothesis

We want to see if \(\mu\) is different from \(\mu_0\)

Null Hypothesis Alternative Hypothesis
\(H_0: \mu=\mu_0\) \(H_a: \mu\ne\mu_0\)
\(H_0: \mu\le\mu_0\) \(H_a: \mu>\mu_0\)
\(H_0: \mu\ge\mu_0\) \(H_0: \mu<\mu_0\)

One-Side vs Two-Side Hypothesis Tests

Notice how there are 3 types of null and alternative hypothesis, The first type of hypothesis (\(H_a:\mu\ne\mu_0\)) is considered a 2-sided hypothesis because the rejection region is located in 2 regions. The remaining two hypotheses are considered 1-sided because the rejection region is located on one side of the distribution.

Null Hypothesis Alternative Hypothesis Side
\(H_0: \mu=\mu_0\) \(H_a: \mu\ne\mu_0\) 2-sided
\(H_0: \mu\le\mu_0\) \(H_a: \mu>\mu_0\) 1-sided
\(H_0: \mu\ge\mu_0\) \(H_0: \mu<\mu_0\) 1-sided

Common Hypothesis Tests

Test # of Samples Null Hypothesis Distribution DF Test Statistics Notes
t-test 1 \(\mu=\mu_0\) \(t_{DF}\) \(DF=n-1\) \(\frac{\bar X-\mu_0}{\sqrt{s^2/n}}\) \(n<30\)
z-test 1 \(\mu=\mu_0\) \(N(0,1)\) None \(\frac{\bar X-\mu_0}{\sqrt{\sigma^2/n}}\) \(n>30\) \(\sigma^2\) known
z-test 1 \(\mu=\mu_0\) \(N(0,1)\) None \(\frac{\bar X-\mu_0}{\sqrt{s^2/n}}\) \(n>30\) \(\sigma^2\) unknown
paired t-test 2 \(\mu_1-\mu_2=D\) \(t_{DF}\) \(DF=n-1\) \(\frac{\bar X_d-D}{\sqrt{s_d^2/n}}\) \(n\) is the number of pairs
independent t-test 2 \(\mu_1-\mu_2=D\) \(t_{DF}\) \(DF=n_1+n_2-2\) \(\frac{\bar X_1-\bar X_2-D}{\sqrt{s_p^2(1/n_1+1/n_2)}}\) \(s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}\)
independent t-test 2 \(\mu_1-\mu_2=D\) \(t_{DF}\) \(DF=\frac{(s^2_1/n_1+s^2_2/n_2)^2}{(s^2_1/n_1)^2/(n_1-1)+(s^2_2/n_2)^2/(n_2-1)}\) \(\frac{\bar X_1-\bar X_2-D}{\sqrt{s_1^2/n_1+s^2_2/n_2}}\)
ANOVA >2 \(\mu_1=\mu_2=\cdots=\mu_k\) \(F_{DF1,DF2}\)

Decision Making: P-Value

The p-value approach is one of the most common methods to report significant results. It is easier to interpret the p-value because it provides the probability of observing our test statistics, or something more extreme, given that the null hypothesis is true. Depending on the type of test, your p-value may be constructed as:

Alternative Hypothesis p-value
\(\mu>\mu_0\) \(P(X>T(x))=p\)
\(\mu<\mu_0\) \(P(X<T(x))=p\)
\(\mu\ne\mu_0\) \(2\times P(X>|T(X)|)=p\)

If \(p < \alpha\), then you reject \(H_0\); otherwise, you will fail to reject \(H_0\).

Decision Making: Confidence Interval Approach

The confidence interval approach can evaluate a hypothesis test where the alternative hypothesis is \(\mu\ne\mu_0\). For this approach you will construct a \((1-\alpha)100\%\) confidence interval as

\[ PE \pm CV *SE \]

where PE is the point estimate, CV is the critical value based on \(\alpha\) and SE is the standard error. This will result in a lower and upper bound denoted as: \((LB, UB)\). The confidence intervals provides a range values to capture the parameter \(\mu\) such that if you repeat this process \(n\) times, \((1-\alpha)100\%\) of \(n\) will capture the true value of \(\mu\). If \(\mu_0 \in (LB,UB)\), then you fail to reject \(H_0\). If \(\mu_0\notin (LB,UB)\), then you reject \(H_0\).

Model Inference

Testing \(\beta_j\)

\[ \frac{\hat\beta_j-\theta}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]

Confidence Intervals

\[ \hat \beta_j \pm CV \times se(\hat\beta_j) \]

  • CV: Critical Value \(P(X<CV) = 1-\alpha/2\)

  • \(\alpha\): significance level

  • SE: Standard Error

Model inference

We conduct model inference to determine if different models are better at explaining variation. A common example is to compare a linear model (\(\hat Y=\hat\beta_0 + \hat\beta_1 X\)) to the mean of Y (\(\hat \mu_y\)). We determine the significance of the variation explained using an Analysis of Variance (ANOVA) table and F test.

ANOVA Table

Source DF SS MS F
Model \(DFR=k-1\) \(SSR\) \(MSR=\frac{SSM}{DFR}\) \(\hat F=\frac{MSR}{MSE}\)
Error \(DFE=n-k\) \(SSE\) \(MSE=\frac{SSE}{DFE}\)
Total \(TDF=n-1\) \(TSS=SSR+SSE\)

\[ \hat F \sim F(DFR, DFE) \]

Model Inference

Given:

\[ M1:\ \hat y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]

\[ M2:\ \hat y = \beta_0 + \beta_1 X_1 \]

Let \(M1\) be the FULL (larger) model, and let \(M2\) be the RED (Reduced, smaller) model.

Model Inference

He can test the following Hypothesis:

  • \(H_0\): The error variations between the FULL and RED model are not different.
  • \(H_1\): The error variations between the FULL and RED model are different.

Test Statistic

\[ \hat F = \frac{[SSE(RED) - SSE(FULL)]/[DFE(RED)-DFE(FULL)]}{MSE(FULL)} \]

\[ \hat F \sim F[DFE(RED) - DFE(FULL), DFE(FULL)] \]

Linear Models Equivalent of Common Statistical Tests

1-Sample t-test

This can be conducted as \(Y=\beta_0\)

Test the following hypothesis: \(H_0:\ \beta_0 = 205\) OR \(H_0: \mu = 205\)

library(palmerpenguins)
library(tidyverse)
penguins <- penguins |> drop_na()
xlm <- penguins  |>  lm(flipper_length_mm ~ 1, data = _)
coef(xlm)
confint(xlm)

## OR
t.test(penguins$flipper_length_mm, mu = 205)

2-Sample t-test

This can be conducted as \(Y=\beta_0 + \beta_1 X\), where \(X\) is a binary variable for 2 samples.

Test the following hypothesis: \(H_0:\ \beta_1 = 0\) OR \(H_0: \mu_1-\mu_2 = 0\)

mtcars |> lm(mpg ~ vs, data = _) |> summary()

## OR

mtcars |> t.test(mpg ~ vs, data = _)

Paired t-test

This can be conducted as \(Y_1 - Y_2=\beta_0\)

Test the following hypothesis: \(H_0:\ \beta_0 = 0\) OR \(H_0: \mu_1 - \mu_2 = 0\)

# DATA PREP
before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)
after <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)
df <- data.frame(before, after)

df |> lm(after - before ~ 1, data = _) |> (\(x) c(coef(x), confint(x)))()
## OR
df |> with(t.test(before, after, paired = T))

ANOVA

This can be conducted as \(Y=\beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k\), where \(X_k\) are binary variable for k samples.

Test the following hypothesis: \(H_0:\ \beta_1 =\beta_2 =\cdots=\beta_k = 0\) OR \(H_0: \mu_1 =\mu_2 =\cdots=\mu_k = 0\)

penguins |> lm(flipper_length_mm ~ species, data = _) |> anova()

## OR

penguins |> aov(flipper_length_mm ~ species, data = _) |> anova()

Applications in R

Individual Hypothesis Testing

penguins |> 
  lm(body_mass_g ~ flipper_length_mm + bill_length_mm + bill_depth_mm + species, data = _) |> 
  summary()

Model Inference

xlm <- penguins |> 
  lm(body_mass_g ~ flipper_length_mm + bill_length_mm + bill_depth_mm + species, data = _) 

summary(xlm)
anova(xlm)

## OR

xlm_null <- penguins |> lm(body_mass_g ~ 1, data = _)
anova(xlm_null, xlm)

## OR

library(supernova)
supernova(xlm)

Comparing Different Models

xlm1 <- penguins |> 
  lm(body_mass_g ~ flipper_length_mm + bill_length_mm + bill_depth_mm + species, data = _) 

xlm2 <- penguins |> 
  lm(body_mass_g ~ flipper_length_mm + species, data = _) 

anova(xlm2, xlm1)