PredictorMeasurements
PredictorMeasurements[predictor,testset,prop]
gives measurements associated with the property prop when predictor is evaluated on testset.
PredictorMeasurements[predictor,testset]
yields a measurement report that can be applied to any property.
PredictorMeasurements[data,…]
use predictions data instead of a predictor.
PredictorMeasurements[…,{prop_{1},prop_{2},…}]
gives properties prop_{1}, prop_{2}, etc.
Details and Options
 Measurements are used to determine the performance of a classifier on data that was not used for training purposes (the test set).
 Possible measurements include classification metrics (accuracy, likelihood, etc.), visualizations (confusion matrix, ROC curve, etc.) or specific examples (such as the worst classified examples).
 The predictor is typically a PredictorFunction object as generated by Predict.
 In PredictorMeasurements[data,…], the predictions data can have the following forms:

{y_{1},y_{2},…} predictions from a predictor (human, algorithm, etc.) {dist_{1},dist_{2},…} predictive distributions obtained by a predictor  PredictorMeasurements[…,opts] specifies that the predictor should use the options opts when applied to the test set. Possible options are as given in PredictorFunction.
 PredictorMeasurements[classifier,testset] returns a PredictorMeasurementsObject[…] that displays as a report panel, such as:

 PredictorMeasurementsObject[…][prop] can be used to look up prop from a PredictorMeasurementsObject. When repeated property lookups are required, this is typically more efficient than using PredictorMeasurements every time.
 PredictorMeasurementsObject[…][prop,opts] specifies that the predictor should use the options opts when applied to the test set. These options supersede original options given to PredictorMeasurements.
 PredictorMeasurements has the same options as PredictorFunction[…], with the following additions:

Weights Automatic weights to be associated with test set examples ComputeUncertainty False whether measures should be given with their statistical uncertainty  When ComputeUncertaintyTrue, numerical measures will be returned as Around[result,err], where err represents the standard error (corresponding to a 68% confidence interval) associated with measure result.
 Possible settings for Weights include:

Automatic associates weight 1 with all test examples {w_{1},w_{2},…} associates weight w_{i} with the i test examples  Changing the weight of a test example from 1 to 2 is equivalent to duplicating the example.
 Weights affect measures as well as their uncertainties.
 Properties returning a single numeric value related to prediction abilities on the test set include:

"StandardDeviation" root mean square of the residuals "StandardDeviationBaseline" standard deviation of test set values "LogLikelihood" loglikelihood of the model given the test data "MeanCrossEntropy" mean cross entropy over test examples "MeanDeviation" mean absolute value of the residuals "MeanSquare" mean square of the residuals "RSquared" coefficient of determination "FractionVarianceUnexplained" fraction of variance unexplained "Perplexity" exponential of the mean cross entropy "RejectionRate" fraction of examples predicted as Indeterminate "GeometricMeanProbabilityDensity" geometric mean of the actualclass probability densities  Test examples classified as Indeterminate will be discarded when measuring properties related to prediction abilities on the test set, such as "StandardDeviation" or "MeanCrossEntropy".
 Properties returning graphics include:

"ComparisonPlot" plot of predicted values versus test values "ICEPlots" Individual Conditional Expectation (ICE) plots "ProbabilityDensityHistogram" histogram of actualclass probability densities "Report" panel reporting main measurements "ResidualHistogram" histogram of residuals "ResidualPlot" plot of the residuals "SHAPPlots" Shapley additive feature explanations plot for each class  Timingrelated properties include:

"EvaluationTime" time needed to predict one example of the test set "BatchEvaluationTime" marginal time to predict one example in a batch  Properties returning one value for each testset example include:

"Residuals" list of differences between predicted and test values "ProbabilityDensities" actualclass prediction probability densities "SHAPValues" Shapley additive feature explanations for each example  "SHAPValues" assesses the contribution of features by comparing predictions with different sets of features removed and then synthesized. The option MissingValueSynthesis can be used to specify how the missing features are synthesized. SHAP explanations are given as deviation from the training output mean. "SHAPValues"n can be used to control the the number of samples used for the numeric estimations of SHAP explanations.
 Properties returning examples from the test set include:

"BestPredictedExamples" examples having the highest actualclass probability density "Examples" all test examples "Examples"{i_{1},i_{2}} all examples in the interval i_{1} predicted in the interval i_{2} "LeastCertainExamples" examples having the highest distribution entropy "MostCertainExamples" examples having the lowest distribution entropy "WorstPredictedExamples" examples having the lowest actualclass probability density  Examples are given in the form input_{i}value_{i}, where value_{i} is the actual value.
 Properties such as "WorstPredictedExamples" or "MostCertainExamples" output up to 10 examples. PredictorMeasurementsObject[…][propn] can be used to output n examples.
 Other properties include:

"PredictorFunction" PredictorFunction[…] being measured "Properties" list of measurement properties available
Examples
open allclose allBasic Examples (3)
Train a predictor on a training set:
Measure the standard deviation of the predictor on the test set:
Visualize a scatter plot of the actual and predicted values:
Measure several properties at once:
Train a predictor on a training set:
Generate a measurement object of the predictor on the test set:
Obtain the list of measurement properties available:
Measure the standard deviation of the predictor on the test set:
Obtain the standard deviation along with its statistical uncertainty due to finite testset size:
Measure the standard deviation directly from classified examples:
Scope (3)
ResidualBased Metrics (1)
Visualize the residuals of predicted examples against their true values:
Measure the standard deviation:
This is equivalent to computing the root mean square of the residuals:
Obtain the statistical uncertainty on this measure:
Compare the standard deviation to a baseline (always predicting the mean of the testset values):
This is equivalent to the proportion of variance that is explained:
Comparison Plot and Example Extraction (1)
Create a training set and a test set on the Boston homes data:
Train a model on the training set:
Create a classifier measurements object for this classifier on the test set:
Find the two test examples that have the worst predictions:
Find the two test examples that have the best predictions:
Compare the predicted and correct values:
Extract the examples whose true values are between 30 and 40 and whose predictions are between 40 and 50:
Probabilistic Metrics (1)
Create and visualize an artificial dataset from the expression Cos[x*y]:
Split the dataset into a training set and a test set:
Train a predictor on the training set:
Measure the loglikelihood of the test set (total logPDF of actual value):
Measure the mean crossentropy:
The mean crossentropy is the average negative loglikelihood:
Options (5)
IndeterminateThreshold (1)
Create an artificial dataset and visualize it:
Split the dataset into a training set and a test set:
Train a predictor on the training set:
Plot the predicted distribution for a few feature values:
Compute the root mean square of the residuals:
Perform the same computation with a different threshold value for the predictor:
This operation can also be done on the PredictorMeasurementsObject:
Plot the standard deviation and the rejection rate as a function of the threshold:
TargetDevice (1)
Train a predictor using a neural network:
Measure the standard deviation of the predictor on a test set for different setting of TargetDevice:
UtilityFunction (1)
Define a training and a test set:
Train a predictor on the training set:
Define and visualize a utility function that penalizes the predicted value's being smaller than the actual value:
Compute the residuals of the predictor on the test set with this utility function:
The residuals with the default utility function are higher:
The utility function can also be specified when using the PredictorMeasurementsObject:
"Uncertainty" (1)
Train a predictor on the "WineQuality" dataset:
Generate a PredictorMeasurements[…] object using a test set:
Obtain a measure of the standard deviation along with its uncertainty:
Obtain a measure of other properties along with their uncertainties:
Applications (2)
Load a dataset of the average monthly temperature as a function of the city, the year and the month:
Split the dataset into a training set and a test set:
Train a predictor on the training set:
Generate a PredictorMeasurementsObject from the predictor and the test set:
Compute the mean cross entropy of the classifier on the test set:
Visualize the scatter plot of the test values as a function of the predicted values:
Extract the test examples that are in a given region of the comparison plot:
Extract the 20 worst predicted examples:
Train a predictor that predicts the median value of properties in a neighborhood of Boston, given some features of the neighborhood:
Generate a predictor measurements object to analyze the performance of the predictor on a test set:
Plot a histogram of the residuals:
Compute the standard deviation of the predicted values from the actual values (root mean square of the residuals):
Text
Wolfram Research (2014), PredictorMeasurements, Wolfram Language function, https://reference.wolfram.com/language/ref/PredictorMeasurements.html (updated 2021).
CMS
Wolfram Language. 2014. "PredictorMeasurements." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2021. https://reference.wolfram.com/language/ref/PredictorMeasurements.html.
APA
Wolfram Language. (2014). PredictorMeasurements. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/PredictorMeasurements.html