Skip to contents

A layer of trainable low-dimensional delay systems. Each unit buffers its encoded input by internally representing a low-dimensional (i.e., compressed) version of the sliding window. Nonlinear decodings of this representation, expressed by the A and B matrices, provide computations across the window, such as its derivative, energy, median value, etc (1, 2). Note that these decoder matrices can span across all of the units of an input sequence.

Usage

layer_lmu(
  object,
  memory_d,
  order,
  theta,
  hidden_cell,
  trainable_theta = FALSE,
  hidden_to_memory = FALSE,
  memory_to_memory = FALSE,
  input_to_hidden = FALSE,
  discretizer = "zoh",
  kernel_initializer = "glorot_uniform",
  recurrent_initializer = "orthogonal",
  kernel_regularizer = NULL,
  recurrent_regularizer = NULL,
  use_bias = FALSE,
  bias_initializer = "zeros",
  bias_regularizer = NULL,
  dropout = 0,
  recurrent_dropout = 0,
  return_sequences = FALSE,
  ...
)

Arguments

memory_d

Dimensionality of input to memory component.

order

The number of degrees in the transfer function of the LTI system used to represent the sliding window of history. This parameter sets the number of Legendre polynomials used to orthogonally represent the sliding window.

theta

The number of timesteps in the sliding window that is represented using the LTI system. In this context, the sliding window represents a dynamic range of data, of fixed size, that will be used to predict the value at the next time step. If this value is smaller than the size of the input sequence, only that number of steps will be represented at the time of prediction, however the entire sequence will still be processed in order for information to be projected to and from the hidden layer. If trainable_theta is enabled, then theta will be updated during the course of training.

hidden_cell

Keras Layer/RNNCell implementing the hidden component.

trainable_theta

If TRUE, theta is learnt over the course of training. Otherwise, it is kept constant.

hidden_to_memory

If TRUE, connect the output of the hidden component back to the memory component (default FALSE).

memory_to_memory

If TRUE, add a learnable recurrent connection (in addition to the static

input_to_hidden

If TRUE, connect the input directly to the hidden component (in addition to

discretizer

The method used to discretize the A and B matrices of the LMU. Current options are "zoh" (short for Zero Order Hold) and "euler". "zoh" is more accurate, but training will be slower than "euler" if trainable_theta=TRUE. Note that a larger theta is needed when discretizing using "euler" (a value that is larger than 4*order is recommended).

kernel_initializer

Initializer for weights from input to memory/hidden component. If NULL, no weights will be used, and the input size must match the memory/hidden size.

recurrent_initializer

Initializer for memory_to_memory weights (if that connection is enabled).

kernel_regularizer

Regularizer for weights from input to memory/hidden component.

recurrent_regularizer

Regularizer for memory_to_memory weights (if that connection is enabled).

use_bias

If TRUE, the memory component includes a bias term.

bias_initializer

Initializer for the memory component bias term. Only used if use_bias=TRUE.

bias_regularizer

Regularizer for the memory component bias term. Only used if use_bias=TRUE.

dropout

Dropout rate on input connections.

recurrent_dropout

Dropout rate on memory_to_memory connection.

return_sequences

If TRUE, return the full output sequence. Otherwise, return just the last output in the output sequence.

Output shape

  • if return_state: a list of tensors. The first tensor is the output. The remaining tensors are the last states, each with shape (batch_size, state_size), where state_size could be a high dimension tensor shape.

  • if return_sequences: N-D tensor with shape [batch_size, timesteps, output_size], where output_size could be a high dimension tensor shape, or [timesteps, batch_size, output_size] when time_major is TRUE

  • else, N-D tensor with shape [batch_size, output_size], where output_size could be a high dimension tensor shape.

Examples

if (FALSE) {
library(keras)
inp <- layer_input(c(28, 3))
hidden_cell <- layer_lstm_cell(10)
lmu <- layer_lmu(memory_d=10, order=3, theta=28, hidden_cell=hidden_cell)(inp)
model <- keras_model(inp, lmu)
model(array(1, c(32, 28, 3)))
}