Recurrent
Neural Networks

Time-Series, Document, and Audio Processing

Sequential Data

Sequential Data
Recurrent Neural Networks
Document Classification
Time Series
R Code Document Classification
R Code Time-Series

Sequential Data

Sequential data is data that is obtained in a series:

\[ X_{(0)}\rightarrow X_{(1)}\rightarrow X_{(2)}\rightarrow X_{(3)}\rightarrow \cdots\rightarrow X_{(J-1)}\rightarrow X_{(J)} \]

Stochastic Procceses

A stochastic process is a collection of random variables, that can be indexed by a parameters. Sequential data can be thought of as a stochastic process.

The generation of a variable \(X_{(j)}\) may or may not be dependent of the previous values.

Examples of Sequential Data

Documents and Books
Temperature
Stock Prices
Speech/Recordings
Handwriting

Recurrent Neural Networks

Sequential Data
Recurrent Neural Networks
Document Classification
Time Series
R Code Document Classification
R Code Time-Series

RNN

Recurrent Neural Networks are designed to analyze input data that is sequential data.

An RNN can accounts for the position of a data point in the sequence as well as the distance it has to other data points.

Using the data sequence, we can predict and outcome \(Y\).

RNN

Source: ISLR2

RNN Inputs

\[ \boldsymbol X = (\boldsymbol x_0, \boldsymbol x_1, \boldsymbol x_2, \cdots, \boldsymbol x_{J-1}, \boldsymbol x_J) \] where

\[ \boldsymbol x_{j} = (x_{j1},x_{j1}, \cdots, x_{jK}) \]

Hidden Layer

\[ h_{j} = f(\bbeta_{hx}\boldsymbol x_{j} + \bbeta_{hh}h_{j-1} + b_h) \]

\(\bbeta_{hx}\) and \(\bbeta_{hh}\) are weight vectors for input-to-hidden, and hidden-to-hidden connections respectively.

Output Layer

\[ y_{j} = g(\bbeta_{hy}h_{j} + b_y) \]

Document Classification

Sequential Data
Recurrent Neural Networks
Document Classification
Time Series
R Code Document Classification
R Code Time-Series

Document Classification

Document Classification is the process of classifying documents to different categories.

Corpus

A corpus refers to a large and structured collection of texts, typically stored electronically, and used for linguistic analysis, natural language processing (NLP), and other computational linguistics tasks.

Corpus

The texts in a corpus can vary widely in nature, ranging from written documents, such as books, articles, and websites, to transcribed speech and social media posts

One-Hot Encoding

One-hot encoding converts categorical variables into a binary format where each category is represented by a binary vector. In this representation, each category corresponds to a unique index in the vector, and only one element in the vector is 1 (hot), while all other elements are 0 (cold).

One-Hot Encoding

Bag-of-words Model

The Bag of Words model is a simple and fundamental technique in natural language processing (NLP). It represents text data as a collection of words, disregarding grammar and word order. Each document is represented by a vector where each dimension corresponds to a unique word in the entire corpus, and the value indicates the frequency of that word in the document. This model is widely used for tasks like text classification and sentiment analysis.

Text Mining

RNN

A recurrent neural network can be used to account the sequential order of the words.

Source: ISLR2

Time Series

Sequential Data
Recurrent Neural Networks
Document Classification
Time Series
R Code Document Classification
R Code Time-Series

Time Series

A time series is a sequence of data points collected or recorded at successive, evenly spaced intervals of time. These data points can represent any variable that is observed or measured over time, such as temperature readings, stock prices, sales figures, or sensor data.

Autocorrelation

\[ X_{(0)}\rightarrow X_{(1)}\rightarrow X_{(2)}\rightarrow X_{(3)}\rightarrow \cdots\rightarrow X_{(J-1)}\rightarrow X_{(J)} \]

More Information

RNN

A recurrent neural network can be used to account the sequential order of each measurement.

Source: ISLR2

R Code Document Classification

Sequential Data
Recurrent Neural Networks
Document Classification
Time Series
R Code Document Classification
R Code Time-Series

R Packages

library(torch)
library(luz) # high-level interface for torch
library(torchdatasets) # for datasets we are going to use
library(zeallot)
torch_manual_seed(909)

set.seed(1)
max_features <- 10000
imdb_train <- imdb_dataset(
  root = ".", 
  download = TRUE,
  split="train",
  num_words = max_features
)
imdb_test <- imdb_dataset(
  root = ".", 
  download = TRUE,
  split="test",
  num_words = max_features
)

imdb_train[1]$x[1:12]

#>  [1]    2   25  124   25   26   11 1113  149    6  211   54    4

word_index <- imdb_train$vocabulary
decode_review <- function(text, word_index) {
   word <- names(word_index)
   idx <- unlist(word_index, use.names = FALSE)
   word <- c("<PAD>", "<START>", "<UNK>", word)
   words <- word[text]
   paste(words, collapse = " ")
}
decode_review(imdb_train[1]$x[1:12], word_index)

#> [1] "<START> you know you are in trouble watching a comedy when the"

library(Matrix)
one_hot <- function(sequences, dimension) {
   seqlen <- sapply(sequences, length)
   n <- length(seqlen)
   rowind <- rep(1:n, seqlen)
   colind <- unlist(sequences)
   sparseMatrix(i = rowind, j = colind,
      dims = c(n, dimension))
}

# collect all values into a list
train <- seq_along(imdb_train) |> 
  lapply(function(i) imdb_train[i]) |>  
  purrr::transpose()
test <- seq_along(imdb_test) |> 
  lapply(function(i) imdb_test[i]) |> 
  purrr::transpose()

# num_words + padding + start + oov token = 10000 + 3
x_train_1h <- one_hot(train$x, 10000 + 3)
x_test_1h <- one_hot(test$x, 10000 + 3)
dim(x_train_1h)

#> [1] 25000 10003

nnzero(x_train_1h) / (25000 * (10000 + 3))

#> [1] 0.01316756

set.seed(3)
ival <- sample(seq(along = train$y), 2000)
itrain <- seq_along(train$y)[-ival]
y_train <- unlist(train$y)

Neural Network

Modules
Model
Fit
Plot

model <- nn_module(
  initialize = function(input_size = 10000 + 3) {
    self$dense1 <- nn_linear(input_size, 16)
    self$relu <- nn_relu()
    self$dense2 <- nn_linear(16, 16)
    self$output <- nn_linear(16, 1)
  },
  forward = function(x) {
    x |> 
      self$dense1() |>  
      self$relu() |> 
      self$dense2() |>  
      self$relu() |> 
      self$output() |> 
      torch_flatten(start_dim = 1)
  }
)

model <- model |> 
  setup(
    loss = nn_bce_with_logits_loss(),
    optimizer = optim_rmsprop,
    metrics = list(luz_metric_binary_accuracy_with_logits())
  ) |> 
  set_opt_hparams(lr = 0.001)

fitted <- model |> 
  fit(
    # we transform the training and validation data into torch tensors
    list(
      torch_tensor(as.matrix(x_train_1h[itrain,]), dtype = torch_float()), 
      torch_tensor(unlist(train$y[itrain]))
    ),
    valid_data = list(
      torch_tensor(as.matrix(x_train_1h[ival, ]), dtype = torch_float()), 
      torch_tensor(unlist(train$y[ival]))
    ),
    dataloader_options = list(batch_size = 512),
    epochs = 5
  )

plot(fitted)

RNN Document Classification

Data
Modules
Model
Fit

maxlen <- 500
num_words <- 10000
imdb_train <- imdb_dataset(root = ".", split = "train", num_words = num_words,
                           maxlen = maxlen)
imdb_test <- imdb_dataset(root = ".", split = "test", num_words = num_words,
                           maxlen = maxlen)

vocab <- c(rep(NA, imdb_train$index_from - 1), imdb_train$get_vocabulary())
tail(names(vocab)[imdb_train[1]$x])

#> [1] "compensate" "you"        "the"        "rental"     ""          
#> [6] "d"

model <- nn_module(
  initialize = function() {
    self$embedding <- nn_embedding(10000 + 3, 32)
    self$lstm <- nn_lstm(input_size = 32, hidden_size = 32, batch_first = TRUE)
    self$dense <- nn_linear(32, 1)
  },
  forward = function(x) {
    c(output, c(hn, cn)) %<-% (x  |>  
      self$embedding() |> 
      self$lstm())
    output[,-1,] |>   # get the last output
      self$dense() |>  
      torch_flatten(start_dim = 1)
  }
)

model <- model |> 
  setup(
    loss = nn_bce_with_logits_loss(),
    optimizer = optim_rmsprop,
    metrics = list(luz_metric_binary_accuracy_with_logits())
  ) |> 
  set_opt_hparams(lr = 0.001)

fitted <- model |> fit(
  imdb_train, 
  epochs = 5,
  dataloader_options = list(batch_size = 128),
  valid_data = imdb_test
)

RNN Diagnostics

Plot
Evaluation

plot(fitted)

predy <- torch_sigmoid(predict(fitted, imdb_test)) > 0.5
evaluate(fitted, imdb_test, dataloader_options = list(batch_size = 512))

#> A `luz_module_evaluation`
#> ── Results ─────────────────────────────────────────────────────────────────────
#> loss: 0.3977
#> acc: 0.8258

R Code Time-Series

Sequential Data
Recurrent Neural Networks
Document Classification
Time Series
R Code Document Classification
R Code Time-Series

Time-Series

Data
Lag
Remove Missing
RNN Prep

library(ISLR2)
xdata <- data.matrix(
 NYSE[, c("DJ_return", "log_volume","log_volatility")]
 )
istrain <- NYSE[, "train"]
xdata <- scale(xdata)

lagm <- function(x, k = 1) {
   n <- nrow(x)
   pad <- matrix(NA, k, ncol(x))
   rbind(pad, x[1:(n - k), ])
}

arframe <- data.frame(log_volume = xdata[, "log_volume"],
   L1 = lagm(xdata, 1), L2 = lagm(xdata, 2),
   L3 = lagm(xdata, 3), L4 = lagm(xdata, 4),
   L5 = lagm(xdata, 5)
 )

arframe <- arframe[-(1:5), ]
istrain <- istrain[-(1:5)]

n <- nrow(arframe)
xrnn <- data.matrix(arframe[, -1])
xrnn <- array(xrnn, c(n, 3, 5))
xrnn <- xrnn[,, 5:1]
xrnn <- aperm(xrnn, c(1, 3, 2))
dim(xrnn)

#> [1] 6046    5    3

Neural Network

Modules
Model
Fit
Prediction

model <- nn_module(
  initialize = function() {
    self$rnn <- nn_rnn(3, 12, batch_first = TRUE)
    self$dense <- nn_linear(12, 1)
    self$dropout <- nn_dropout(0.2)
  },
  forward = function(x) {
    c(output, ...) %<-% (x |> 
      self$rnn())
    output[,-1,] |> 
      self$dropout() |> 
      self$dense() |> 
      torch_flatten(start_dim = 1)
  }
)

model <- model |> 
  setup(
    optimizer = optim_rmsprop,
    loss = nn_mse_loss()
  ) |> 
  set_opt_hparams(lr = 0.001)

fitted <- model |> fit(
    list(xrnn[istrain,, ], arframe[istrain, "log_volume"]),
    epochs = 20, # = 200,
    dataloader_options = list(batch_size = 64),
    valid_data =
      list(xrnn[!istrain,, ], arframe[!istrain, "log_volume"])
  )

kpred <- as.numeric(predict(fitted, xrnn[!istrain,, ]))
V0 <- var(arframe[!istrain, "log_volume"])
1 - mean((kpred - arframe[!istrain, "log_volume"])^2) / V0

#> [1] 0.3947611