AnomalyDetection

AnomalyDetection[{example₁,example₂,…}]

generates an AnomalyDetectorFunction[…] based on the examples given.

AnomalyDetection[LearnedDistribution[…]]

generates an anomaly detector based on the given distribution.

AnomalyDetection[True{example₁₁,example₁₂,…},False{example₂₁,…}]

can be used to indicate which examples should be considered anomalous.

Details and Options

AnomalyDetection attempts to model the distribution of non-anomalous data in order to detect anomalies (i.e. "out-of-distribution" examples).
Examples are considered anomalous when their RarerProbability value is below the value specified for AcceptanceThreshold.

AnomalyDetection can be used on many types of data, including numerical, nominal and images.
Each example_i can be a single data element, a list of data elements or an association of data elements. Examples can also be given as a Dataset or a Tabular object.
Anomalous data can also be specified using the following syntax:

	True{e₁₁,e₁₂,…},False{e₂₁,…}	association of anomalous (True) and non-anomalous data
	{e₁,e₂,…}{True,False,…}	rule between examples and anomaly specifications
	{e₁True,e₂False,…}	list of anomaly specification rules
	{e₁,e₂,…}{i,j,…}	anomalies at position i, j, …
	{e₁,e₂,…}None	no anomalous examples

AnomalyDetection[examples] yields an AnomalyDetectorFunction[…] that can detect anomalies, given new examples.
FindAnomalies[AnomalyDetectorFunction[…],data,…] can be used to find anomalies in data according to the given detector.
When test data comes from the same distribution as training data, AcceptanceThreshold corresponds to the anomaly detection false-positive rate.
AnomalyDetection can be used with or without indicating which examples are anomalous (and which are not). Indicating which examples are anomalous helps with training the anomaly detector and allows it to determine a value for AcceptanceThreshold automatically.
In AnomalyDetection[True{example₁₁,example₁₂,…},False{example₂₁,…}], True indicates that the corresponding examples are anomalous, and False that they are not. These labels can also be specified by AnomalyDetection[{example₁,example₂,…}{True,False,…}] and AnomalyDetection[{example₁True,example₂False,…}].
AnomalyDetection[{example₁,example₂,…}{i,j,…}] can be used to specify that example_i, example_j, etc. should be considered anomalous, and that others should be considered as non-anomalous.
AnomalyDetection[{example₁,example₂,…}None] specifies that none of the examples are anomalous.
The following options can be given:

AcceptanceThreshold	0.001	RarerProbability threshold to consider an example anomalous
FeatureExtractor	Identity	how to extract features from which to learn
FeatureNames	Automatic	feature names to assign for input data
FeatureTypes	Automatic	feature types to assume for input data
Method	Automatic	which modeling algorithm to use
PerformanceGoal	Automatic	aspects of performance to optimize
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally
TimeGoal	Automatic	how long to spend training the detector
TrainingProgressReporting	Automatic	how to report progress during training
ValidationSet	Automatic	the set of data on which to evaluate the model during training

Possible settings for PerformanceGoal include:

	"Memory"	minimize storage requirements of the detector
	"Quality"	maximize the modeling quality of the detector
	"Speed"	maximize speed for detecting new anomalies
	"TrainingSpeed"	minimize time spent producing the detector
	Automatic	automatic tradeoff among speed, quality and memory
	{goal₁,goal₂,…}	automatically combine goal₁, goal₂, etc.

Possible settings for Method are as given in LearnDistribution[…].
The following settings for TrainingProgressReporting can be used:

	"Panel"	show a dynamically updating graphical panel
	"Print"	periodically report information using Print
	"ProgressIndicator"	show a simple ProgressIndicator
	"SimplePanel"	dynamically updating panel without learning curves
	None	do not report any information

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

AnomalyDetection[…,FeatureExtractor"Minimal"] indicates that the internal preprocessing should be as simple as possible.

Examples

open allclose all

Basic Examples (2)

Train a detector function on a numeric dataset:

Use the trained detector to find examples that are considered anomalous:

Train an AnomalyDetectorFunction on a list of colors:

Attempt to find outliers in a list of colors using the trained anomaly detector:

Scope (8)

Train an AnomalyDetectorFunction by labeling the anomalous examples with True, and False for the others:

Specify the anomalies in an explicit list:

Specify the anomalies with a list of rules:

Specify only the position of anomalous examples:

Train an AnomalyDetectorFunction by specifying that none of the examples are anomalous:

Use the trained AnomalyDetectorFunction to find anomalies:

Train an AnomalyDetectorFunction on tabular data:

Apply the detector on a new table:

Train an AnomalyDetectorFunction on a two-dimensional array of pseudorandom real numbers:

Use the trained AnomalyDetectorFunction to find anomalies in new examples with FindAnomalies:

Use the trained AnomalyDetectorFunction to find anomalies and their corresponding positions:

Train a LearnedDistribution on colors:

Generate an AnomalyDetectorFunction based on the trained distribution:

Use the detector function to find out-of-distribution colors:

Options (5)

AcceptanceThreshold (1)

Create and visualize random 3D vectors with anomalies:

Train an anomaly detector function on the training set:

Use the anomaly detector function to find and visualize the anomalous examples in the test set:

Change the anomaly detection false-positive rate by specifying the AcceptanceThreshold:

Method (1)

Obtain training and test datasets of images:

Add "out-of-distribution" examples to the test set:

Train the anomaly detector using the "Multinormal" method:

Find anomalous examples in the test set:

Train the anomaly detector using the "KernelDensityEstimation" method and attempt to find anomalies:

PerformanceGoal (1)

Load the Fisher's Irises dataset with its numerical attributes:

Train an anomaly detector function by specifying the PerformanceGoal:

Compare the training time for anomaly detector functions with different performance goals:

TimeGoal (1)

Obtain a dataset of images and train an anomaly detector function by specifying the time goal:

Obtain the training time of the anomaly detection:

TrainingProgressReporting (1)

Obtain a dataset of images:

Show training progress interactively without the plots:

Print the training progress periodically during training:

Show a simple progress indicator:

Applications (3)

Obtain the Fisher's Iris dataset:

Train an anomaly detector assuming no out-of-distribution examples:

Use the detector on a new, unlabeled and partial measurement:

Obtain training and test datasets of images:

Add anomalous examples to the test set:

Train an anomaly detector on the training set:

Find anomalous examples in the test set:

Obtain a random sample of training and test datasets of images:

Add anomalous examples to corrupt the datasets:

Train a "supervised" anomaly detector by specifying the position of the known anomalies in the training set:

Use the trained anomaly detector on the test set:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

AnomalyDetection

Details and Options

Examples

Basic Examples (2)

Scope (8)