Neural Networks

with Torch

R Packages

# install.packages("torch")
# install.packages("luz")

library(torch)
library(luz) # high-level interface for torch
torch_manual_seed(13)

Data in Python

import plotnine
from plotnine.data import penguins
clean_penguins = (
    penguins
        .dropna()
        .drop(columns=["year"])
)

Machine Learning

Machine Learning
Linear Regression
Neural Networks
Other Topics
Single-Layer Neural Network in R
Training/Validating/Testing Data

What is Machine Learning?

Machine Learning (ML) is the process of characterizing mathematical models, with the help of data, to predict outcomes of interest.

Machine Learning Model

\[ Y = f(\boldsymbol X; \boldsymbol \theta) \]

\(Y\): is an outcome of interest that we are trying to predict
\(\boldsymbol X\): A vector of data points used to predict the outcome
\(f\): An unknown function that predicts the outcome
\(\boldsymbol \theta\): A vector of parameters to used to define the model

What is going on?

Data helps buil in a model
Uses mathematical models to predict outcomes
Uses probability theory to model randomness and loss
Uses numerical algorithms to:
- Find numerical values of parameters that minimize randomness
- Model unknown and nonlinear relationships

Main Types of Machine Learning

Supervised Learning

Data includes inputs and outputs
ML Models learns to map inputs → outputs

Unsupervised Learning

Data has no labels
Model discovers hidden patterns

Linear Regression

Machine Learning
Linear Regression
Neural Networks
Other Topics
Single-Layer Neural Network in R
Training/Validating/Testing Data

Linear Regression

Linear Regression is a tool to predict continuous random variables that are known to follow a Normal Distribution, with a set of known predictor variables.

Simple Linear Regression

Simple linear regression will model the association between one predictor variable and an outcome:

\[ \hat Y = \beta_0 + \beta_1 X \]

\(\beta_0\): Intercept term
\(\beta_1\): Slope term

MLR

Multivariable linear regression models are used when more than one explanatory variable is used to explain the outcome of interest.

Additional Continuous Variable

To fit additional variable to the model, we will only need to add it to the model:

\[ \hat Y = \beta_0 +\beta_1 X_{1} + \beta_2 X_{2} \]

Categorical Variable

A categorical variable can be included in a model, but a reference category must be specified.

Fitting a model with categorical variables

To fit a model with categorical variables, we must utilize dummy (binary) variables that indicate which category is being referenced. We use \(C-1\) dummy variables where \(C\) indicates the number of categories. When coded correctly, each category will be represented by a combination of dummy variables.

Example

If we have 4 categories, we will need 3 dummy variables:

	Cat 1	Cat 2	Cat 3
Dummy 1	1	0	0
Dummy 2	0	1	0
Dummy 3	0	0	1

Which one is the reference category?

Model with categorical variables

Fitting an additional variable with 4 Categories

\[ \hat Y = \beta_0 +\beta_1 X_{1} + \beta_2 X_{2} + \beta_3 D_{1} + \beta_4 D_{2} + \beta_5 D_{3} \]

Sums of Squared Errors

We find the values of \(\beta_0, \cdots, \beta_5\) that minimizes the following function for \(i\) data points:

\[ RSS = \sum^n_{i=1}(Y_i-\hat Y_i)^2 \]

Matrix Formulation

\[ Y_i = \boldsymbol X_i^\mathrm T \boldsymbol \beta + \epsilon_i \]

\(Y_i\): Outcome Variable
\(\boldsymbol X_i\): Predictors
\(\boldsymbol \beta\): Coefficients
\(\epsilon_i\): error term

Matrix Data Formulation

\[ \boldsymbol\beta = (\boldsymbol X^\mathrm T \boldsymbol X)^{-1} \boldsymbol X^\mathrm T \boldsymbol Y \]

Neural Networks

Machine Learning
Linear Regression
Neural Networks
Other Topics
Single-Layer Neural Network in R
Training/Validating/Testing Data

Neural Networks

Neural networks are a type of machine learning algorithm that are designed to mimic the function of the human brain. They consist of interconnected nodes or “neurons” that process information and generate outputs based on the inputs they receive.

Uses

Neural networks are typically used for tasks such as image recognition, natural language processing, and prediction. They are capable of learning from data and improving their performance over time, which makes them well-suited for complex and dynamic problems.

Neural Network

Neural Network Composition

Inputs: A set of characteristics in the data that we use to predict the outcome of interest
Outputs: A set of variables (may be one) we wish to predict
Hidden Layers: A set of functions that will transform the data such that it can better predict the outputs
- Each hidden layer will has nodes that indicates the transformation

Single Layer Neural Network

Model Setup

\[ Y = f(\boldsymbol X; \boldsymbol \theta) \]

\(\boldsymbol X\): a vector of predictor variables
\(\boldsymbol \theta\): a vector of parameters (\(\boldsymbol \beta, \boldsymbol \alpha\))

Single Layer Neural Networks

A single layer neural networks can be formulated as linear function:

\[ f(\boldsymbol X; \boldsymbol \theta) = \beta_0 + \sum^K_{k=1}\beta_kh_k(\boldsymbol X) \]

Where \(X\) is a vector of inputs of length \(p\) and \(K\) is the number of nodes (neurons), \(\beta_j\) are parameters

\[ h_k(\boldsymbol X) = g\left(\alpha_{k0} + \sum^p_{l=1}\alpha_{kl}X_{l}\right) \]

with \(g(\cdot)\) being a nonlinear activation function and \(\alpha_{kl}\) are the weights (parameters).

Fitting a Neural Network

Fitting a neural network is the process of taking input data (\(X\)), finding the numerical values for the paramters that will minimize the following loss function, mean squared errors (MSE):

\[ \frac{1}{n}\sum^n_{i-1}\left\{Y_i-f(\boldsymbol X; \boldsymbol \theta)\right\}^2 \]

Single-Layer Neural Network in R

Machine Learning
Linear Regression
Neural Networks
Other Topics
Single-Layer Neural Network in R
Training/Validating/Testing Data

Build a single-layer neural network that will predict body_mass with the remaining predictors. The hidden layer will contain 20 nodes, and the activation functions will be ReLU.

library(tidyverse)
penguins <- penguins |> drop_na()
px <- penguins |>
  model.matrix(body_mass ~ . - 1, data = _) |> 
  scale() |> 
  torch_tensor(dtype = torch_float())

py <- penguins |> 
  select(body_mass) |> 
  as.matrix() |> 
  torch_tensor(dtype = torch_float())

Model Description

Overall
Initialize
Forward

modnn <- nn_module(
  initialize = function(input_size) {
    self$hidden <- nn_linear(input_size, 20)
    self$activation <- nn_relu()
    self$output <- nn_linear(20, 1)
  },
  forward = function(x) {
    x |> 
      self$hidden() |>  
      self$activation() |>  
      self$output()
  }
)

Creates the functions needed to describe the details of each network.

initialize = function(input_size) {
    self$hidden <- nn_linear(input_size, 50)
    self$activation <- nn_relu()
    self$output <- nn_linear(50, 1)
  }

Models the neural network.

forward = function(x) {
    x |> 
      self$hidden() |>  
      self$activation() |>  
      self$output()
  }

Optimizer Set Up

modnn <- modnn |> 
  setup(
    loss = nn_mse_loss(), # Used for numerical counts
    optimizer = optim_rmsprop
  ) |>
  set_hparams(input_size = ncol(px))

Fit a Model

Fit
Plot

fitted <- modnn |> 
  fit(
    data = list(px, py),
    epochs = 200 # Can think as number of iterations
  )

Code

plot(fitted)

Training/Validating/Testing Data

Machine Learning
Linear Regression
Neural Networks
Other Topics
Single-Layer Neural Network in R
Training/Validating/Testing Data

Error Rate

When creating a model, we are interested in determining how effective the model will be in predicting a new data point, ie not in our training data.

The error rate is a metric to determine how often will future data points be when using our model.

The problem is how can we get future data to validate our model?

Training/Validating/Testing Data

The Training/Validating/Testing Data set is a way to take the original data set and split into 3 seperate data sets: training, validating, and testing.

Training
Validating
Testing

This is data used to create the model.

This is data used to evaluate the data during it’s creation. It is evaluate at each Iteration (Epoch)

This is data used to test the final model and compute the error rate.

Training Error Rate

Training Error Rate is the error rate of the data used to create the model of interest. It describes how well the model predicts the data used to construct it.

Test Error Rate

Test Error Rate is the error rate of predicting a new data point using the current established model.

Penguin Data

Train/Test/Evaluation
Training
Validate
Testing

penguins <- penguins |> drop_na()
training <- penguins |> slice_sample(prop = .8)
pre <- penguins |> anti_join(training)
validate <- pre |> slice_sample(prop =  0.5)
testing <- pre |> anti_join(validate)

Xtraining <- training |> 
  model.matrix(body_mass ~ . - 1, data = _) |> 
  scale() |> 
  torch_tensor(dtype = torch_float())

Ytraining <- training |> 
  select(body_mass) |> 
  as.matrix() |> 
  torch_tensor(dtype = torch_float())

Xvalidate <- validate |> 
  model.matrix(body_mass ~ . - 1, data = _) |> 
  scale() |> 
  torch_tensor(dtype = torch_float())

Yvalidate <- validate |> 
  select(body_mass) |> 
  as.matrix() |> 
  torch_tensor(dtype = torch_float())

Xtesting <- testing |> 
  model.matrix(body_mass ~ . - 1, data = _) |> 
  scale() |> 
  torch_tensor(dtype = torch_float())

Ytesting <- testing |> 
  select(body_mass) |> 
  as.matrix() |> 
  torch_tensor(dtype = torch_float())

Model Description

modnn <- nn_module(
  initialize = function(input_size) {
    self$hidden <- nn_linear(input_size, 20)
    self$activation <- nn_relu()
    self$output <- nn_linear(20, 1)
  },
  forward = function(x) {
    x |> 
      self$hidden() |>  
      self$activation() |>  
      self$output()
  }
)

Optimizer Set Up

modnn <- modnn |> 
  setup(
    loss = nn_mse_loss(), # Used for numerical counts
    optimizer = optim_rmsprop
  ) |>
  set_hparams(input_size = ncol(px))

Fit a Model

Fit
Plot

fitted <- modnn |> 
  fit(
    data = list(Xtraining, Ytraining),
    epochs = 200, # Can think as number of iterations
    valid_data = list(Xvalidate, Yvalidate)
  )

Code

plot(fitted)

Testing Model

Prediction
MAE
Plot

npred <- predict(fitted, Xtesting)

mean(abs(as.matrix(Ytesting) - as.matrix(npred)))

Code

plot(as.matrix(Ytesting), as.matrix(npred),
     xlab = "Truth",
     ylab = "Predicted")

Neural Networks

R Packages

Data in Python

Machine Learning

What is Machine Learning?

Machine Learning Model

What is going on?

Main Types of Machine Learning

Supervised Learning

Unsupervised Learning

Linear Regression

Linear Regression

Simple Linear Regression

MLR

Additional Continuous Variable

Categorical Variable

Fitting a model with categorical variables

Example

Model with categorical variables

Sums of Squared Errors

Matrix Formulation

Matrix Data Formulation

Neural Networks

Neural Networks

Uses

Neural Network

Neural Network Composition

Single Layer Neural Network

Model Setup

Single Layer Neural Networks

Fitting a Neural Network

Other Topics

Nonlinear (Activations) Function \(g(\cdot)\)

Optimizer

Single-Layer Neural Network in R

Penguin Data

Model Description

Optimizer Set Up

Fit a Model

Training/Validating/Testing Data

Error Rate

Training/Validating/Testing Data

Training Error Rate

Test Error Rate

Penguin Data

Model Description

Optimizer Set Up

Fit a Model

Testing Model