Wolfram Language & System Documentation Center

LearnDistribution

LearnDistribution[{example₁,example₂,…}]

generates a LearnedDistribution[…] that attempts to represent an underlying distribution for the examples given.

Details and Options

LearnDistribution trains machine learning models on data to automatically discover and fit probability distributions, returning a LearnedDistribution object.
LearnDistribution can be used on many types of data, including numerical, nominal and images.
Each example_i can be a single data element, a list of data elements or an association of data elements. Examples can also be given as a Dataset or a Tabular object.
LearnDistribution effectively assumes that each of the example_i is independently drawn from an underlying distribution, which LearnDistribution attempts to infer.
LearnDistribution[examples] yields a LearnedDistribution[…] on which the following functions can be used:

	PDF[dist,…]	probability or probability density for data
	RandomVariate[dist]	random samples generated from the distribution
	SynthesizeMissingValues[dist,…]	fill in missing values according to the distribution
	RarerProbability[dist,…]	compute the probability to generate a sample with lower PDF than a given example

The following options can be given:

FeatureExtractor	Identity	how to extract features from which to learn
FeatureNames	Automatic	feature names to assign for input data
FeatureTypes	Automatic	feature types to assume for input data
Method	Automatic	which modeling algorithm to use
PerformanceGoal	Automatic	aspects of performance to try to optimize
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally
TimeGoal	Automatic	how long to spend training the classifier
TrainingProgressReporting	Automatic	how to report progress during training
ValidationSet	Automatic	the set of data on which to evaluate the model during training

Possible settings for PerformanceGoal include:

	"DirectTraining"	train directly on the full dataset, without model searching
	"Memory"	minimize storage requirements of the distribution
	"Quality"	maximize the modeling quality of the distribution
	"Speed"	maximize speed for PDF queries
	"SamplingSpeed"	maximize speed for generating random samples
	"TrainingSpeed"	minimize time spent producing the distribution
	Automatic	automatic tradeoff among speed, quality and memory
	{goal₁,goal₂,…}	automatically combine goal₁, goal₂, etc.

Possible settings for Method include:

	"ContingencyTable"	discretize data and store each possible probability
	"DecisionTree"	use a decision tree to compute probabilities
	"GaussianMixture"	use a mixture of Gaussian (normal) distributions
	"KernelDensityEstimation"	use a kernel mixture distribution
	"Multinormal"	use a multivariate normal (Gaussian) distribution

The following settings for TrainingProgressReporting can be used:

	"Panel"	show a dynamically updating graphical panel
	"Print"	periodically report information using Print
	"ProgressIndicator"	show a simple ProgressIndicator
	"SimplePanel"	dynamically updating panel without learning curves
	None	do not report any information

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

Only reversible feature extractors can be given in the option FeatureExtractor.
LearnDistribution[…,FeatureExtractor"Minimal"] indicates that the internal preprocessing should be as simple as possible.
All images are first conformed using ConformImages.
Information[LearnedDistribution[…]] generates an information panel about the distribution and its estimated performances.

Examples

open all close all

Basic Examples (3)

Train a distribution on a numeric dataset:

Wolfram Language code: ld = LearnDistribution[{1.2, 2.5, 3.2, 4.6}]

Generate a new example based on the learned distribution:

Wolfram Language code: RandomVariate[ld]

Compute the PDF of a new example:

Wolfram Language code: PDF[ld, 3.4]

Train a distribution on a nominal dataset:

Wolfram Language code: ld = LearnDistribution[{"A", "A", "B", "B", "B"}]

Generate a new example based on the learned distribution:

Wolfram Language code: RandomVariate[ld]

Compute the probability of the examples "A" and "B":

Wolfram Language code: PDF[ld, {"A", "B"}]

Train a distribution on a two-dimensional dataset:

Wolfram Language code: ld = LearnDistribution[{{1.1, 1.4}, {2.3, 3.1}, {4.4, 5.4}, {8.7, 7.5}}]

Generate a new example based on the learned distribution:

Wolfram Language code: RandomVariate[ld]

Compute the probability of two examples:

Wolfram Language code: PDF[ld, {{4.2, 5.3}, {4.2, 8.4}}]

Impute the missing value of an example:

Wolfram Language code: SynthesizeMissingValues[ld, {4.2, Missing[]}]

Scope (3)

Train a distribution on a dataset containing numeric and nominal variables:

Wolfram Language code: ld = LearnDistribution[{{1.1, "A"}, {2.3, "A"}, {4.4, "B"}, {8.7, "B"}}]

Generate a new example based on the learned distribution:

Wolfram Language code: RandomVariate[ld]

Impute the missing value of an example:

Wolfram Language code: SynthesizeMissingValues[ld, {Missing[], "A"}]

Train a distribution on colors:

Wolfram Language code:

ld = LearnDistribution[{RGBColor[0.5172966964096541, 0.4435322033449375, 1.], RGBColor[0.3984626930847484, 0.5592892024442906, 1.], RGBColor[0.6149389612362844, 0.5648721294502163, 1.], RGBColor[0.4129156497559272, 0.9146065592632544, 1.], RGBColor[0.7907065846445507, 0.41054133291260947, 1.], RGBColor[0.4878854162550912, 0.9281119680196579, 1.], RGBColor[0.9884362181280959, 0.49025178842859785, 1.], RGBColor[0.633242503827218, 0.9880985331612835, 1.], RGBColor[0.9215182482568276, 0.8103084921468551, 1.], RGBColor[0.667469513641223, 0.46420827644204676, 1.]}]

Generate 100 new examples based on the learned distribution:

Wolfram Language code: RandomVariate[ld, 100]

Compute the probability density of some colors:

Wolfram Language code: PDF[ld, {RGBColor[1, 0, 0], RGBColor[0, 0, 1], RGBColor[0, 1, 0]}]

Train a distribution on dates:

Wolfram Language code:

ld = LearnDistribution[{DateObject[{2018, 7, 28, 12}, "Hour", "Gregorian", -7.], DateObject[{2018, 8, 16, 18, 19, 13.906976}, "Instant", "Gregorian", -7.], DateObject[{2018, 7, 30, 20, 15, 22.188818}, "Instant", "Gregorian", -7.], DateObject[{2018, 8, 13, 11, 22, 17.611770}, "Instant", "Gregorian", -7.], DateObject[{2018, 8, 10, 8, 37, 27.063134}, "Instant", "Gregorian", -7.], DateObject[{2018, 8, 5, 15, 17, 54.472131}, "Instant", "Gregorian", -7.]}]

Generate 10 new examples based on the learned distribution:

Wolfram Language code: RandomVariate[ld, 10]

Compute the probability density of some new dates:

Wolfram Language code:

PDF[ld, {DateObject[{2018, 9, 26, 18, 24, 47.352225}, "Instant", "Gregorian", -7.], DateObject[{2018, 8, 15}, "Day", "Gregorian", -7.], DateObject[{1989, 5, 30}, "Day", "Gregorian", -7.]}]

Options (6)

FeatureTypes (1)

Specify that the data is nominal:

Wolfram Language code: ld1 = LearnDistribution[{1, 1, 2, 2}, FeatureTypes -> "Nominal"]

Wolfram Language code: PDF[ld1, {1, 1.5, 2}]

Without specification, the data is considered numerical:

Wolfram Language code: ld2 = LearnDistribution[{1, 1, 2, 2}]

Wolfram Language code: PDF[ld2, {1, 1.5, 2}]

Method (2)

Train a "Multinormal" distribution on a numeric dataset:

Wolfram Language code: ld = LearnDistribution[{1.2, 2.1, 3.5, 4.3}, Method -> "Multinormal"]

Plot the PDF along with the training data:

Wolfram Language code:

Show[Plot[PDF[ld, x], {x, -2, 8}, Filling -> Bottom], NumberLinePlot[{1.2, 2.1, 3.5, 4.3}, Spacings -> 0, PlotStyle -> Red]]

Train a distribution on a two-dimensional dataset with all available methods ("Multinormal", "ContingencyTable", "KernelDensityEstimation", "DecisionTree" and "GaussianMixture"):

Wolfram Language code: iris = ExampleData[{"MachineLearning", "FisherIris"}, "Data"][[All, 1, {1, 3}]];

Wolfram Language code: methods = {"Multinormal", "GaussianMixture", "KernelDensityEstimation", "ContingencyTable", "DecisionTree"};

Wolfram Language code: distributions = LearnDistribution[iris, Method -> #]& /@ methods;

Visualize the probability density of these distributions:

Wolfram Language code:

MapThread[Show[ContourPlot[PDF[#2, {x, y}], {x, 4, 8}, {y, 1, 7}, PlotRange -> All, Contours -> 10], ListPlot[iris, PlotStyle -> Directive[Red, PointSize[.02]]], PlotLabel -> #1, ImageSize -> 150] &, {methods, distributions}]

TimeGoal (2)

Learn a distribution while specifying a total training time of 5 seconds:

Wolfram Language code: ld = LearnDistribution[{1, 2, 3, 4}, TimeGoal -> 5]

Wolfram Language code: Information[ld, "TrainingTime"]

Load 1000 images of the "MNIST" dataset:

Wolfram Language code: digits = ResourceData["MNIST"][[ ;; 1000, 1]];

Wolfram Language code: RandomSample[digits, 10]

Learn its distribution while specifying a target training time of 3 seconds:

Wolfram Language code: ld1 = LearnDistribution[digits, TimeGoal -> 3]

The loss value obtained (cross-entropy) is about -0.43:

Wolfram Language code: Information[ld1]

Learn its distribution while specifying a target training time of 30 seconds:

Wolfram Language code: ld2 = LearnDistribution[digits, TimeGoal -> 30]

The loss value obtained (cross-entropy) is about -0.978:

Wolfram Language code: Information[ld2]

Compare the learning curves for both trainings:

Wolfram Language code: Information[#, "LearningCurve"] & /@ {ld1, ld2}

TrainingProgressReporting (1)

Load the "UCILetter" dataset:

Wolfram Language code: dataset = ExampleData[{"MachineLearning", "UCILetter"}, "Data"][[All, 1]];

Show training progress interactively during training:

Wolfram Language code: LearnDistribution[dataset, TrainingProgressReporting -> "Panel"];

Wolfram Language code: [image]

Show training progress interactively without plots:

Wolfram Language code: LearnDistribution[dataset, TrainingProgressReporting -> "SimplePanel"];

Wolfram Language code: [image]

Print training progress periodically during training:

Wolfram Language code: LearnDistribution[dataset, TrainingProgressReporting -> "Print"];

Show a simple progress indicator:

Wolfram Language code: LearnDistribution[dataset, TrainingProgressReporting -> "ProgressIndicator"];

Wolfram Language code: [image]

Do not report progress:

Wolfram Language code: LearnDistribution[dataset, TrainingProgressReporting -> None];

Applications (4)

Obtain a dataset of images:

Wolfram Language code: cifar = ResourceData["CIFAR-100"][[All, 1]];

Wolfram Language code: RandomSample[cifar, 10]

Train a distribution on the images:

Wolfram Language code: ld = LearnDistribution[cifar]

Generate 50 new examples based on the learned distribution:

Wolfram Language code: RandomVariate[ld, 50]

Compare the probability density for an image of the training set, an image of a test set, a sample from the learned distribution, an image of another dataset and a random image:

Wolfram Language code: testimages = {[image], [image], [image], [image], [image]};

Wolfram Language code: Grid[{#, PDF[ld, #]} & /@ testimages]

Obtain the probability to generate a sample with a lower PDF for each of these images:

Wolfram Language code: Grid[{#, RarerProbability[ld, #]} & /@ testimages]

Load a sample dataset:

Wolfram Language code: irises = ResourceData["Sample Tabular Data: Fisher Iris"]

Train a distribution directly from the Tabular object:

Wolfram Language code: ld = LearnDistribution[irises]

Generate a random sample:

Wolfram Language code: RandomVariate[ld]

Generate several random samples:

Wolfram Language code: RandomVariate[ld, 4]

Visualize random samples of the variables "PetalLength" and "SepalLength" from the distribution and compare them with the dataset:

Wolfram Language code:

Show[
	ListPlot[RandomVariate[ld, 100] -> {"PetalLength", "SepalLength"}, AxesLabel -> Automatic], 
	ListPlot[irises -> {"PetalLength", "SepalLength"}, PlotStyle -> Red]
	]

Load the Titanic survival dataset:

Wolfram Language code: titanic = ResourceData["Sample Data: Titanic Survival"]

Train a distribution on the dataset:

Wolfram Language code: ld = LearnDistribution[titanic]

Use the distribution and SynthesizeMissingValues to generate complete examples from incomplete ones:

Wolfram Language code: SynthesizeMissingValues[ld, <|"Class" -> "1st"|>]

Wolfram Language code: SynthesizeMissingValues[ld, <|"Class" -> "1st", "Sex" -> "female"|>]

Use the distribution to predict the survival probability of a given passenger:

Wolfram Language code:

p = PDF[ld, <|"Class" -> "1st", "Sex" -> "female", "Age" -> Quantity[20, "Years"], "SurvivalStatus" -> #|>] & /@ {"survived", "died"};
First[p / Total[p]]

Wolfram Language code:

p = PDF[ld, <|"Class" -> "3rd", "Sex" -> "name", "Age" -> Quantity[40, "Years"], "SurvivalStatus" -> #|>] & /@ {"survived", "died"};
First[p / Total[p]]

Train a distribution on a two-dimensional dataset:

Wolfram Language code: iris = ExampleData[{"MachineLearning", "FisherIris"}, "Data"][[All, 1, {1, 3}]];

Wolfram Language code: ld = LearnDistribution[iris]

Plot the PDF along with the training data:

Wolfram Language code:

Show[ContourPlot[PDF[ld, {x, y}], {x, 4, 8}, {y, 1, 7}, PlotRange -> All, Contours -> 10], ListPlot[iris, PlotStyle -> Red]]

Use SynthesizeMissingValues to impute missing values using the learned distribution:

Wolfram Language code: SynthesizeMissingValues[ld, {5.5, Missing[]}]

Obtain the histogram of possible imputed values:

Wolfram Language code: Histogram[Table[Last@SynthesizeMissingValues[ld, {5.5, Missing[]}], 2000], 50]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

LearnDistribution

Details and Options

Examples

Basic Examples (3)

Scope (3)

Options (6)

FeatureTypes (1)

Method (2)

TimeGoal (2)

TrainingProgressReporting (1)

Applications (4)

Text

CMS

APA

BibTeX

BibLaTeX

LearnDistribution

Details and Options

Examples

Basic Examples (3)

Scope (3)

Options (6)

FeatureTypes (1)

Method (2)

TimeGoal (2)

TrainingProgressReporting (1)

Applications (4)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX