Linear Regression

Estimation Procedures

Learning Objectives

Estimation
Ordinary Least Squares
Matrix Formulation
Standard Errors
Conduct in R

Updated Quarto HW Document

---
title: "Title"
author: "Name Here"
date: "`r format(Sys.time(),'%m-%d-%Y')`"
format: 
  html:
    toc: true
    toc-depth: 2
    code-fold: true
    code-tools: true
    code-line-numbers: true
    embed-resources: true
knitr:
  opts_chunk:
    echo: true
    message: false
    warning: false
    error: true
    tidy: styler
    R.options:
      digits: 3
      max.print: 100
---

## Problem 1

## Problem 2

## Problem 3

Resources

Linear Algebra for Data Science

Estimation

Ordinary Least Squares
Maximum Likelihood Approach
Method of Moments

Standard Errors

Find the variance of the estimate
Find the information matrix
Use for Inference

Ordinary Least Squares

For a data pair \((x_i,y_i)_{i=1}^n\), the ordinary least squares estimator will find the estimates of \(\hat\beta_0\) and \(\hat\beta_1\) that minimize the following function:

\[ \sum^n_{i=1}\{y_i-(\beta_0+\beta_1x_i)\}^2 \]

Estimates

\[ \hat\beta_0 = \bar y - \hat\beta_1\bar x \] \[ \hat\beta_1 = \frac{\sum^n_{i=1}(y_i-\bar y)(x_i-\bar x)}{\sum^n_{i=1}(x_i-\bar x)^2} \] \[ \hat\sigma^2 = \frac{1}{n-2}\sum^n_{i=1}(y_i-\hat y_i)^2 \]

Matrix Formulation

Matrix Version of Model

\[ y_i = \boldsymbol X_i^\mathrm T \boldsymbol \beta + \epsilon_i \]

\(y_i\): Outcome Variable
\(\boldsymbol X_i=(1, x_i)^\mathrm T\): Predictors
\(\boldsymbol \beta = (\beta_0, \beta_1)^\mathrm T\): Coefficients
\(\epsilon_i\): error term

Data Matrix Formulation

For \(n\) data points

\[ \boldsymbol Y = \boldsymbol X^\mathrm T\boldsymbol \beta + \boldsymbol \epsilon \]

\(\boldsymbol Y = (y_1, \cdots, y_n)^\mathrm T\): Outcome Variable
\(\boldsymbol X=(\boldsymbol X_1, \cdots, \boldsymbol X_n)^\mathrm T\): Predictors
\(\boldsymbol \beta = (\beta_0, \beta_1)^\mathrm T\): Coefficients
\(\boldsymbol \epsilon = (\epsilon_1, \cdots, \epsilon_n)^\mathrm T\): Error terms

Least Squares Formula

\[ (Y - \boldsymbol X ^\mathrm T\boldsymbol \beta)^\mathrm T(Y - \boldsymbol X ^\mathrm T\boldsymbol \beta) \]

Estimates

\[ \hat{\boldsymbol \beta} = (\boldsymbol X ^\mathrm T\boldsymbol X)^{-1}\boldsymbol X ^\mathrm T\boldsymbol Y \]

Standard Errors

Estimate for \(\sigma^2\)

\[ \hat \sigma^2 = \frac{1}{n-2} \sum^n_{i=1} (y_i-\boldsymbol X_i^\mathrm T\hat{\boldsymbol \beta})^2 \]

Standard Errors of \(\beta\)’s

\[ SE(\hat\beta_0)=\sqrt{\frac{\sum^n_{i=1}x_i^2\hat\sigma^2}{n\sum^n_{i=1}(x_i-\bar x)^2}} \]

\[ SE(\hat\beta_1)=\sqrt\frac{\hat\sigma^2}{\sum^n_{i=1}(x_i-\bar x)^2} \]

Standard Errors Matrix Form

\[ Var(\hat {\boldsymbol \beta}) = (\boldsymbol X ^\mathrm T\boldsymbol X)^{-1} \hat \sigma^2 \]

R approaches

Built in Functions

You can use the lm to fit a linear model and extract the estimated values and standard errors

Matrix Formulation

R is capable of conducting matrix operations with the following functions:

%*%: matrix multiplication
t(): transpose a matrix
solve(): computes the inverse matrix

Minimization Problem

Minimize the least squares using a numerical methods in R. The optim() function will minimize a function for set of parameters. We can minimize a function, least squares function, and supply initial values (0) for the parameters of interest.

Fit a Line using `lm` for the following data

x <- rpois(500, 6)
y <- -9 * x + 32 + rnorm(500, sd = sqrt(2))
lm_res <- lm(y ~ x) 
summary(lm_res)
sqrt(vcov(lm_res))
sigma(lm_res)^2

Fit a linear model using matrix operation

X <- cbind(rep(1, 500), x)
solve(t(X)%*%X) %*% t(X) %*% y

Minimizing a function using `optim`

Find the value of x and y that will minimize the following function for any value a and b.

\[ f(x) = 2(x - 5)^2 + 11 \]

d <- function(x){
  2 * (x - 5) + 11
}
optim(0, d)

Maximizing a function using `optim`

Find the value of x and y that will minimize the following function for any value a and b.

\[ f(x) = - 3 (x - 8)^2 + 9 \]

d <- function(x){
  2 * (x - 5) + 11
}
optim(0, d)

Minimizing function using `optim`

Find the value of x and y that will minimize the following function for any value a and b.

\[ f(x,y) = \frac{(x-3)^2}{a^2} + \frac{(y+4)^2}{b^2} \]

Maximizing function using `optim`

Find the value of \(\lambda\) that maximize the following function:

\[ \ell(\lambda) = \sum^n_{i=1} -\lambda+X_i\log(\lambda) -\log(X_i!) \]

with the following data:

z <- rpois(1000, lambda = 2)

Fit a linear model using `optim`

Minimize:

\[ \sum^n_{i=1}(y_i-\hat y_i)^2 \]

\[ \hat y_i = \hat \beta_0 +\hat\beta_1 x_i \]

Data:

x <- rpois(500, 6)
y <- -9 * x + 32 + rnorm(500, sd = sqrt(2))

Linear Regression

Learning Objectives

Updated Quarto HW Document

Resources

Estimation

Estimation

Standard Errors

Ordinary Least Squares

Ordinary Least Squares

Estimates

Matrix Formulation

Matrix Version of Model

Data Matrix Formulation

Least Squares Formula

Estimates

Standard Errors

Estimate for \(\sigma^2\)

Standard Errors of \(\beta\)’s

Standard Errors Matrix Form

R approaches

Built in Functions

Matrix Formulation

Minimization Problem

Fit a Line using lm for the following data

Fit a linear model using matrix operation

Minimizing a function using optim

Maximizing a function using optim

Minimizing function using optim

Maximizing function using optim

Fit a linear model using optim

Fit a Line using `lm` for the following data

Minimizing a function using `optim`

Maximizing a function using `optim`

Minimizing function using `optim`

Maximizing function using `optim`

Fit a linear model using `optim`