Legendre Memory Unit layer — layer

A layer of trainable low-dimensional delay systems. Each unit buffers its encoded input by internally representing a low-dimensional (i.e., compressed) version of the sliding window. Nonlinear decodings of this representation, expressed by the A and B matrices, provide computations across the window, such as its derivative, energy, median value, etc (1, 2). Note that these decoder matrices can span across all of the units of an input sequence.

Usage

layer_lmu(
  object,
  memory_d,
  order,
  theta,
  hidden_cell,
  trainable_theta = FALSE,
  hidden_to_memory = FALSE,
  memory_to_memory = FALSE,
  input_to_hidden = FALSE,
  discretizer = "zoh",
  kernel_initializer = "glorot_uniform",
  recurrent_initializer = "orthogonal",
  kernel_regularizer = NULL,
  recurrent_regularizer = NULL,
  use_bias = FALSE,
  bias_initializer = "zeros",
  bias_regularizer = NULL,
  dropout = 0,
  recurrent_dropout = 0,
  return_sequences = FALSE,
  ...
)

Arguments

memory_d: Dimensionality of input to memory component.
order: The number of degrees in the transfer function of the LTI system used to represent the sliding window of history. This parameter sets the number of Legendre polynomials used to orthogonally represent the sliding window.
theta: The number of timesteps in the sliding window that is represented using the LTI system. In this context, the sliding window represents a dynamic range of data, of fixed size, that will be used to predict the value at the next time step. If this value is smaller than the size of the input sequence, only that number of steps will be represented at the time of prediction, however the entire sequence will still be processed in order for information to be projected to and from the hidden layer. If trainable_theta is enabled, then theta will be updated during the course of training.
hidden_cell: Keras Layer/RNNCell implementing the hidden component.
trainable_theta: If TRUE, theta is learnt over the course of training. Otherwise, it is kept constant.
hidden_to_memory: If TRUE, connect the output of the hidden component back to the memory component (default FALSE).
memory_to_memory: If TRUE, add a learnable recurrent connection (in addition to the static
input_to_hidden: If TRUE, connect the input directly to the hidden component (in addition to
discretizer: The method used to discretize the A and B matrices of the LMU. Current options are "zoh" (short for Zero Order Hold) and "euler". "zoh" is more accurate, but training will be slower than "euler" if trainable_theta=TRUE. Note that a larger theta is needed when discretizing using "euler" (a value that is larger than 4*order is recommended).
kernel_initializer: Initializer for weights from input to memory/hidden component. If NULL, no weights will be used, and the input size must match the memory/hidden size.
recurrent_initializer: Initializer for memory_to_memory weights (if that connection is enabled).
kernel_regularizer: Regularizer for weights from input to memory/hidden component.
recurrent_regularizer: Regularizer for memory_to_memory weights (if that connection is enabled).
use_bias: If TRUE, the memory component includes a bias term.
bias_initializer: Initializer for the memory component bias term. Only used if use_bias=TRUE.
bias_regularizer: Regularizer for the memory component bias term. Only used if use_bias=TRUE.
dropout: Dropout rate on input connections.
recurrent_dropout: Dropout rate on memory_to_memory connection.
return_sequences: If TRUE, return the full output sequence. Otherwise, return just the last output in the output sequence.

Output shape

if return_state: a list of tensors. The first tensor is the output. The remaining tensors are the last states, each with shape (batch_size, state_size), where state_size could be a high dimension tensor shape.
if return_sequences: N-D tensor with shape [batch_size, timesteps, output_size], where output_size could be a high dimension tensor shape, or [timesteps, batch_size, output_size] when time_major is TRUE
else, N-D tensor with shape [batch_size, output_size], where output_size could be a high dimension tensor shape.

References

A. Voelker, I. Kajić and C. Eliasmith, Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks (2019)
A. Voelker and C. Eliasmith Improving spiking dynamical networks: Accurate delays, higher-order synapses, and time cells. Neural Computation, 30(3): 569-609. (2018)
A. Voelker and C. Eliasmith Methods and systems for implementing dynamic neural networks. U.S. Patent Application No. 15/243,223.
Is LSTM (Long Short-Term Memory) dead?, CrossValidated

Examples

if (FALSE) {
library(keras)
inp <- layer_input(c(28, 3))
hidden_cell <- layer_lstm_cell(10)
lmu <- layer_lmu(memory_d=10, order=3, theta=28, hidden_cell=hidden_cell)(inp)
model <- keras_model(inp, lmu)
model(array(1, c(32, 28, 3)))
}