# EmpiricalDistribution

EmpiricalDistribution[{x1,x2,}]

represents an empirical distribution based on the data values xi.

EmpiricalDistribution[{{x1,y1,},{x2,y2,},}]

represents a multivariate empirical distribution based on the data values {xi,yi,}.

EmpiricalDistribution[{w1,w2,}{d1,d2,}]

represents an empirical distribution where data values di occur with weights wi.

# Details # Examples

open allclose all

## Basic Examples(2)

Create an empirical distribution of univariate data:

Visualize distribution functions:

Compute moments and quantiles:

Create an empirical distribution of bivariate data:

Visualize the estimated CDF:

Compute covariance and general moments:

## Scope(19)

### Basic Uses(10)

Create an empirical distribution of univariate data:

Larger datasets lead to better approximations of the underlying distribution:

Construct empirical distribution for quantity data:

Calculate select descriptive statistics:

Use exact numeric data:

Specify a list of weights corresponding to each data value:

Use symbolic weights:

A general moment of the distribution:

The CDF evaluated at 4:

Create an empirical distribution of bivariate data:

Larger datasets produce smoother estimates:

Specify a list of weights for bivariate data:

Create an empirical distribution of data in higher dimensions:

Plot the univariate marginal CDFs:

Plot the bivariate marginal CDFs:

EmpiricalDistribution works with the values only when the input is a TimeSeries:

Compare to using the values only:

EmpiricalDistribution works with all the values together when the input is a TemporalData:

The same as:

### Distribution Properties(9)

Obtain empirical estimates of distribution functions:

PDF and HazardFunction are discrete:

CDF and SurvivalFunction are piecewise constant:

Compute moments:

Special moments:

General moments:

Estimate the quantile function:

Special quantile values:

Generate a set of random numbers:

Compare the histogram to the PDF of the underlying density:

Compute probabilities and expectations:

Generating functions:

Estimate bivariate distribution functions:

CDF and SurvivalFunction are piecewise constant:

Compute bivariate moments:

Special moments:

General moments:

Generate a set of random numbers:

## Applications(8)

Compare the distribution of data to a theoretical distribution:

Compare multivariate data to a theoretical distribution:

The difference:

Produce a smoothed representation with SmoothKernelDistribution:

Using HistogramDistribution with bin delimiters set to the data creates a linear interpolation of EmpiricalDistribution:

Ten letters published in 1861 under the name Quintus Curtius Snodgrass are claimed to have been authored by Mark Twain. Compare the word length distribution for the letters to some works by Mark Twain:

Comparison to the English language in general emphasizes the similarity:

A test for goodness of fit suggests, however, that Twain did not write the QCS letters:

Compare the distributions of winning times in Scottish hill races for those who take the high road and those who take the low road:

Plot record times vs. elevation gain:

Find the median elevation gain:

Split races at the median elevation gain to the high and low roads:

It appears that it is faster to take the low road:

The record times of high road races vary more than those of low road races:

The National Institutes of Health estimates that 2% of the population has a certain disease. A test for the disease is proposed that detects its presence 95% of the time with a false positive rate of 5%. Given that a patient tests positive, find the probability that he or she actually has the disease:

Equations for the unknown probabilities based on the information given:

Solve the equations, assuming the probabilities sum to unity:

The probability a patient has the disease given a positive test result:

A group of 21 students was selected at random to participate in a new directed reading program. A control group of 23 students was educated with traditional methods. Reading test scores for students in the two groups were recorded following their programs. Perform a permutation-based test on the scores to determine if the directed reading program was successful:

The mean difference in test scores across the groups can be used as a test statistic:

Simulate the null distribution of the test statistic by randomly permuting the groups:

At the 5% level, there is evidence that the new program made a difference:

LocationTest could have been used to test the hypothesis directly:

## Properties & Relations(8)

Random number generation from an empirical distribution returns a bootstrapped sample:

EmpiricalDistribution is a consistent estimator of the underlying distribution:

Moments and their equivalence to those of the data:

The population rather than the sample variance is used for empirical distributions:

Quantiles are equivalent to Quantile applied directly to the data:

EmpiricalDistribution is equivalent to SurvivalDistribution with no censoring:

Use the union of data values as bin delimiters for HistogramDistribution:

The resulting PDF is a zero-order interpolation of the PDF for EmpiricalDistribution:

Applying N to exact data can reduce memory consumption:

The CDFs are equivalent:

EmpiricalDistribution on integers can be specified using ProbabilityDistribution: