Encoding and Decoding Data for Neural Networks

Real work data comes in a wild variety of types, including numerical, categorical, textual, image, audio and video. Neural networks are a particular sophisticated way to manipulate numerical arrays but cannot deal natively with non-numerical inputs. The Wolfram Language offers dedicated encoders and decoders to automatically and efficiently translate non-numerical data to and from net-compatible NumericArray objects. Custom encoders can be created from special feature extractors or any generic function.

Encoders

NetEncoder convert images, categories, etc. to numerical arrays

Audio Encoders

"Audio" encode audio as a sequence of waveform amplitudes

"AudioMelSpectrogram" encode audio as a mel spectrogram

"AudioMFCC" encode audio as a sequence of MFCC vectors

"AudioSpectrogram" encode audio as a spectrogram

"AudioSTFT" encode audio as a sequence of Fourier transforms

Text Encoders

"SubwordTokens" encode tokens in a string as a sequence of integer codes

"Characters" encode characters in a string as a sequence of integer codes or one-hot vectors

"Tokens" encode tokens in a string as a sequence of integer codes

"UTF8" encode strings as their UTF8 bytes

Image Encoders

"Image" encode a 2D image as a rank-3 array

"Image3D" encode a 3D image as a rank-4 array

Video Encoders

"VideoFrames" encode frames of a video as a rank-3 array

Other Encoders

"Boolean" encode True and False as 1 and 0

"Class" encode a class label as an integer code or a one-hot vector

"Function" use a custom function to encode an input

"FeatureExtractor" use a FeatureExtractorFunction to encode the input

Decoders

NetDecoder interpret numerical arrays as images, probabilities, etc.

Text Decoders

"SubwordTokens" decode probability vectors as a string of subword tokens

"Characters" decode probability vectors as a string of characters

"Tokens" decode probability vectors as a string of tokens

Image Decoders

"Image" decode a rank-3 array as a 2D image

"Image3D" decode a rank-4 array as a 3D image

Other Decoders

"Boolean" decode 1 and 0 as True and False

"Class" decode probability arrays as class labels

"CTCBeamSearch" decode sequences of probability vectors trained with a CTCLossLayer

"Function" decode using a custom function