Encoding and Decoding Data for Neural Networks
Real work data comes in a wild variety of types, including numerical, categorical, textual, image, audio and video. Neural networks are a particular sophisticated way to manipulate numerical arrays but cannot deal natively with non-numerical inputs. The Wolfram Language offers dedicated encoders and decoders to automatically and efficiently translate non-numerical data to and from net-compatible NumericArray objects. Custom encoders can be created from special feature extractors or any generic function.
Encoders
NetEncoder — convert images, categories, etc. to numerical arrays
Audio Encoders
"Audio" — encode audio as a sequence of waveform amplitudes
"AudioMelSpectrogram" — encode audio as a mel spectrogram
"AudioMFCC" — encode audio as a sequence of MFCC vectors
"AudioSpectrogram" — encode audio as a spectrogram
"AudioSTFT" — encode audio as a sequence of Fourier transforms
Text Encoders
"SubwordTokens" — encode tokens in a string as a sequence of integer codes
"Characters" — encode characters in a string as a sequence of integer codes or one-hot vectors
"Tokens" — encode tokens in a string as a sequence of integer codes
"UTF8" — encode strings as their UTF8 bytes
Image Encoders
"Image" — encode a 2D image as a rank-3 array
"Image3D" — encode a 3D image as a rank-4 array
Video Encoders
"VideoFrames" — encode frames of a video as a rank-3 array
Other Encoders
"Boolean" — encode True and False as 1 and 0
"Class" — encode a class label as an integer code or a one-hot vector
"Function" — use a custom function to encode an input
"FeatureExtractor" — use a FeatureExtractorFunction to encode the input
Decoders
NetDecoder — interpret numerical arrays as images, probabilities, etc.
Text Decoders
"SubwordTokens" — decode probability vectors as a string of subword tokens
"Characters" — decode probability vectors as a string of characters
"Tokens" — decode probability vectors as a string of tokens
Image Decoders
"Image" — decode a rank-3 array as a 2D image
"Image3D" — decode a rank-4 array as a 3D image
Other Decoders
"Boolean" — decode 1 and 0 as True and False
"Class" — decode probability arrays as class labels
"CTCBeamSearch" — decode sequences of probability vectors trained with a CTCLossLayer
"Function" — decode using a custom function