Wolfram Language & System Documentation Center

GatedRecurrentLayer

represents a trainable recurrent layer that takes a sequence of vectors and produces a sequence of vectors each of size n.

GatedRecurrentLayer[n,opts]

includes options for initial weights and other parameters.

Details and Options

GatedRecurrentLayer[n] represents a net that takes an input matrix representing a sequence of vectors and outputs a sequence of the same length.
Each element of the input sequence is a vector of size k, and each element of the output sequence is a vector of size n.
The size k of the input vectors is usually inferred automatically within a NetGraph, NetChain, etc.
The input and output ports of the net represented by GatedRecurrentLayer[n] are:
"Input" a sequence of vectors of size k

"Output" a sequence of vectors of size n
Given an input sequence {x₁,x₂,…,x_T}, a GatedRecurrentLayer outputs a sequence of states {s₁,s₂,…,s_T} using the following recurrence relation, where f is Tanh by default:

	input gate	i_t=LogisticSigmoid[W_ix.x_t+W_is.s_t-1+b_i]
	reset gate	r_t=LogisticSigmoid[W_rx.x_t+W_rs.s_t-1+b_r]
	memory gate	m_t=f[W_mx.x_t+r_t*(W_ms.s_t-1)+b_m]
	state	s_t=(1-i_t)m_t+i_ts_t-1

The above definition of GatedRecurrentLayer is based on the variant described in Chung et al., Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, 2014.
GatedRecurrentLayer[n] also has a single state port, "State", which is a vector of size n.
Within a NetGraph, a connection of the form src->NetPort[layer,"State"] can be used to provide the initial value of the state for a GatedRecurrentLayer, corresponding to s₀ in the recurrence relation. The default initial value is a zero vector.
Within a NetGraph, a connection of the form NetPort[layer,"State"]->dst can be used to obtain the final value of the state for a GatedRecurrentLayer, corresponding to s_T in the recurrence relation.
NetStateObject can be used to create a net that will remember values for the state of GatedRecurrentLayer that update when the net is applied to inputs.
An initialized GatedRecurrentLayer[…] that operates on vectors of size k contains the following trainable arrays:

"InputGateInputWeights"	W_ix	matrix of size n×k
"InputGateStateWeights"	W_is	matrix of size n×n
"InputGateBiases"	b_i	vector of size n
"ResetGateInputWeights"	W_rx	matrix of size n×k
"ResetGateStateWeights"	W_rs	matrix of size n×n
"ResetGateBiases"	b_r	vector of size n
"MemoryGateInputWeights"	W_mx	matrix of size n×k
"MemoryGateStateWeights"	W_ms	matrix of size n×n
"MemoryGateBiases"	b_m	vector of size n

In GatedRecurrentLayer[n,opts], initial values can be given to the trainable arrays using a rule of the form "array"->value.
The following optional parameters can be included:

"Activation"	"Tanh"	activation function applied to obtain the memory gate output
"Dropout"	None	dropout regularization, in which units are probabilistically set to zero
LearningRateMultipliers	Automatic	learning rate multipliers for trainable arrays

Parameter "Activation" can be set to either "Tanh" or "ReLU".
Specifying "Dropout"->None disables dropout during training.
Specifying "Dropout"->p applies an automatically chosen dropout method with dropout probability p.
Specifying "Dropout"->{"method₁"->p₁,"method₂"->p₂,…} can be used to combine specific methods of dropout with the corresponding dropout probabilities. Possible methods include:

	"VariationalWeights"	dropout applied to the recurrent connections between weight matrices (default)
	"VariationalInput"	dropout applied to the gate contributions from the input, using the same pattern of units at each sequence step
	"VariationalState"	dropout applied to the gate contributions from the previous state, using the same pattern of units at each sequence step
	"StateUpdate"	dropout applied to the state update vector prior to it being added to the previous state, using a different pattern of units at each sequence step

The dropout methods "VariationalInput" and "VariationalState" are based on the Gal et al. 2016 method, while "StateUpdate" is based on the Semeniuta et al. 2016 method and "VariationalWeights" is based on the Merity et al. 2017 method.
GatedRecurrentLayer[n,"Input"->shape] allows the shape of the input to be specified. Possible forms for shape are:

	NetEncoder[…]	encoder producing a sequence of vectors
	{len,k}	sequence of len length-k vectors
	{len,Automatic}	sequence of len vectors whose length is inferred
	{"Varying",k}	varying number of vectors each of length k
	{"Varying",Automatic}	varying number of vectors each of inferred length

When given a NumericArray as input, the output will be a NumericArray.
Options[GatedRecurrentLayer] gives the list of default options to construct the layer. Options[GatedRecurrentLayer[…]] gives the list of default options to evaluate the layer on some data.
Information[GatedRecurrentLayer[…]] gives a report about the layer.
Information[GatedRecurrentLayer[…],prop] gives the value of the property prop of GatedRecurrentLayer[…]. Possible properties are the same as for NetGraph.

Examples

open all close all

Basic Examples (2)

Create a GatedRecurrentLayer that produces a sequence of length-3 vectors:

Wolfram Language code: GatedRecurrentLayer[3]

Create a randomly initialized GatedRecurrentLayer that takes a sequence of length-2 vectors and produces a sequence of length-3 vectors:

Wolfram Language code: gru = NetInitialize@GatedRecurrentLayer[3, "Input" -> {"Varying", 2}]

Apply the layer to an input sequence:

Wolfram Language code: gru[{{1, 2}, {0, 1}, {2, 1}, {1, 2}}]//MatrixForm

Scope (6)

Arguments (1)

Create a GatedRecurrentLayer that produces a sequence of length-3 vectors:

Wolfram Language code: GatedRecurrentLayer[2]

Ports (5)

Create a randomly initialized GatedRecurrentLayer that takes a string and produces a sequence of length-2 vectors:

Wolfram Language code:

gru = 
	NetInitialize@GatedRecurrentLayer[2, "Input" -> NetEncoder[{"Characters", Automatic, "UnitVector"}]]

Apply the layer to an input string:

Wolfram Language code: gru["hello world"]//MatrixForm

Thread the layer over a batch of inputs:

Wolfram Language code: gru[{"good", "bye"}]//Map[MatrixForm]

Create a randomly initialized net that takes a sequence of length-2 vectors and produces a single length-3 vector:

Wolfram Language code:

net = NetInitialize@NetChain[{
	GatedRecurrentLayer[3, "Input" -> {"Varying", 2}], 
	SequenceLastLayer[]
	}]

Apply the layer to an input:

Wolfram Language code: net[{{1, 2}, {2, 1}, {3, 0}}]

Thread the layer across a batch of inputs:

Wolfram Language code: net[{{{1, 2}, {2, 1}, {3, 0}}, {{3, 3}, {0, 0}}}]

Create a GatedRecurrentLayer that uses the Ramp ("ReLU") function to produce its memory gate output:

Wolfram Language code: GatedRecurrentLayer[3, "Activation" -> "ReLU"]

Create a NetGraph that allows the initial state of a GatedRecurrentLayer to be set:

Wolfram Language code:

graph = NetInitialize@NetGraph[
	{GatedRecurrentLayer[2, "Input" -> {"Varying", 2}]}, 
	{NetPort["InitialState"] -> NetPort[1, "State"]}]

Wolfram Language code: graph[<|"Input" -> {{1, 2}, {2, 0}, {3, 1}}, "InitialState" -> {1, 1}|>]

Create a NetGraph that allows the final state of a GatedRecurrentLayer to be obtained:

Wolfram Language code:

graph = NetInitialize@NetGraph[
	{GatedRecurrentLayer[2, "Input" -> {"Varying", 2}]}, 
	{NetPort[1, "State"] -> NetPort["FinalState"]}]

Wolfram Language code: graph[{{1, 2}, {2, 0}, {3, 1}}]

The final state is the last element of the output sequence:

Wolfram Language code: %[["FinalState"]] == %[["Output", -1]]

Options (2)

"Dropout" (2)

Create a GatedRecurrentLayer with the dropout method specified:

Wolfram Language code: net = GatedRecurrentLayer[2, "Dropout" -> {"VariationalInput" -> 0.2, "VariationalState" -> 0.5}]

Create a randomly initialized GatedRecurrentLayer with specified dropout probability:

Wolfram Language code: gru = NetInitialize@GatedRecurrentLayer[2, "Dropout" -> 0.5, "Input" -> {"Varying", 2}]

Evaluate the layer on a sequence of vectors:

Wolfram Language code:

data = {{1, 2}, {3, 4}, {1, 0}, {2, 2}};
gru[data]//MatrixForm

Dropout has no effect during evaluation:

Wolfram Language code:

gru2 = NetReplacePart[gru, "Dropout" -> 0];
gru[data] == gru2[data]

Use NetEvaluationMode to force the training behavior of dropout:

Wolfram Language code: gru[data, NetEvaluationMode -> "Train"]//MatrixForm

Multiple evaluations on the same input can give different results:

Wolfram Language code:

Table[gru[{{1, 2}, {3, 4}, {5, 6}, {7, 8}}, NetEvaluationMode -> "Train"], 5]//Map[MatrixPlot[#, ImageSize -> 30, FrameTicks -> None]&]

Applications (2)

Create training data consisting of strings that describe two-digit additions and the corresponding numeric result:

Wolfram Language code:

enc = NetEncoder[{"Characters", {DigitCharacter, "+"}}];
makeRule[a_, b_] := IntegerString[a] <> "+" <> IntegerString[b] -> a + b;
trainingData = Flatten@Table[makeRule[i, j], {i, 0, 99}, {j, 0, 99}];
RandomSample[trainingData, 4]//InputForm

Create a network using stacked GatedRecurrentLayer layers that reads the input string and predicts the numeric result:

Wolfram Language code:

net = NetChain[{UnitVectorLayer[], GatedRecurrentLayer[40], GatedRecurrentLayer[30], SequenceLastLayer[], LinearLayer[1]}, "Input" -> enc, "Output" -> "Scalar"];

Train the network:

Wolfram Language code: trained = NetTrain[net, trainingData, BatchSize -> 64, MaxTrainingRounds -> 200, ValidationSet -> Scaled[0.1]]

Apply the trained network to a list of inputs:

Wolfram Language code: Round@trained[{"5+2", "10+25", "9+11", "44+44"}]

Create training data based on strings containing x's and y's, and either Less, Greater or Equal by comparing the number of x's and y's. The training data consists of all possible sentences up to length 8:

Wolfram Language code:

alphabet = {"x", "y", " "};
inputs = Flatten@Table[StringJoin /@ Tuples[alphabet, n], {n, 1, 8}];
cmpXY[str_] := Switch[Sign[StringCount[str, "x"] - StringCount[str, "y"]], -1, Less, 0, Equal, 1, Greater];
outputs = cmpXY /@ inputs;
trainingData = Thread[inputs -> outputs];
RandomSample[trainingData, 4]//InputForm

Create a network containing a GatedRecurrentLayer to read an input string and predict one of Less, Greater or Equal:

Wolfram Language code:

net = NetChain[{UnitVectorLayer[], GatedRecurrentLayer[5], SequenceLastLayer[], LinearLayer[3], SoftmaxLayer[]}, "Input" -> NetEncoder[{"Characters", alphabet}], "Output" -> NetDecoder[{"Class", {Less, Equal, Greater}}]];

Train the network:

Wolfram Language code: trained = NetTrain[net, trainingData, BatchSize -> 64, MaxTrainingRounds -> 100, ValidationSet -> Scaled[0.2]]

Apply the trained network to a list of inputs:

Wolfram Language code: trained[{"xxy", "xyy", "xyxy"}]

Measure the accuracy on the entire training set:

Wolfram Language code: NetMeasurements[trained, trainingData, "Accuracy"]

Properties & Relations (1)

NetStateObject can be used to create a net that remembers the state of GatedRecurrentLayer:

Wolfram Language code: sobj = NetStateObject[NetInitialize@GatedRecurrentLayer[2, "Input" -> {"Varying", 2}]]

Each evaluation modifies the state stored inside the NetStateObject:

Wolfram Language code: sobj[{{0, 0}, {1, 1}}]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

GatedRecurrentLayer

Details and Options

Examples

Basic Examples (2)

Scope (6)

Arguments (1)

Ports (5)

Options (2)

"Dropout" (2)

Applications (2)

Properties & Relations (1)

Text

CMS

APA

BibTeX

BibLaTeX

GatedRecurrentLayer

Details and Options

Examples

Basic Examples (2)

Scope (6)

Arguments (1)

Ports (5)

Options (2)

"Dropout" (2)

Applications (2)

Properties & Relations (1)

See Also

Tech Notes

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX