Encoding and Decoding Data for Neural Networks

Real work data comes in a wild variety of types, including numerical, categorical, textual, image, audio and video. Neural networks are a particular sophisticated way to manipulate numerical arrays but cannot deal natively with non-numerical inputs. The Wolfram Language offers dedicated encoders and decoders to automatically and efficiently translate non-numerical data to and from net-compatible NumericArray objects. Custom encoders can be created from special feature extractors or any generic function.


NetEncoder convert images, categories, etc. to numerical arrays

Audio Encoders

"Audio" encode audio as a sequence of waveform amplitudes

"AudioMelSpectrogram" encode audio as a mel spectrogram

"AudioMFCC" encode audio as a sequence of MFCC vectors

"AudioSpectrogram" encode audio as a spectrogram

"AudioSTFT" encode audio as a sequence of Fourier transforms

Text Encoders

"SubwordTokens" encode tokens in a string as a sequence of integer codes

"Characters" encode characters in a string as a sequence of integer codes or one-hot vectors

"Tokens" encode tokens in a string as a sequence of integer codes

"UTF8" encode strings as their UTF8 bytes

Image Encoders

"Image" encode a 2D image as a rank-3 array

"Image3D" encode a 3D image as a rank-4 array

Video Encoders

"VideoFrames" encode frames of a video as a rank-3 array

Other Encoders

"Boolean" encode True and False as 1 and 0

"Class" encode a class label as an integer code or a one-hot vector

"Function" use a custom function to encode an input

"FeatureExtractor" use a FeatureExtractorFunction to encode the input


NetDecoder interpret numerical arrays as images, probabilities, etc.

Text Decoders

"SubwordTokens" decode probability vectors as a string of subword tokens

"Characters" decode probability vectors as a string of characters

"Tokens" decode probability vectors as a string of tokens

Image Decoders

"Image" decode a rank-3 array as a 2D image

"Image3D" decode a rank-4 array as a 3D image

Other Decoders

"Boolean" decode 1 and 0 as True and False

"Class" decode probability arrays as class labels

"CTCBeamSearch" decode sequences of probability vectors trained with a CTCLossLayer

"Function" decode using a custom function