"RandomForest" (Machine Learning Method)

Details & Suboptions

  • Random forest is an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees. The forest prediction is obtained by taking the most common class or the mean-value tree predictions. Each decision tree is trained on a random subset of the training set and only uses a random subset of the features (bootstrap aggregating algorithm).
  • The following options can be given:
  • "DistributionSmoothing" 0.5regularization parameter
    "FeatureFraction" Automaticthe fraction of features to be randomly selected to train each tree
    "LeafSize" Automaticthe maximum number of examples in each leaf
    "TreeNumber" Automaticthe number of trees in the forest
  • "FeatureFraction", "LeafSize" and "DistributionSmoothing" can be used to control overfitting.

Examples

open allclose all

Basic Examples  (3)

Train a predictor on labeled examples:

Obtain information about the predictor:

Predict a new example:

Train a classifier function on labeled examples:

Plot the probability that the class of an example is "A" or "B" as a function of the feature and compare them:

Train a predictor function on labeled data:

Compare the data with the predicted values and look at the standard deviation:

Options  (6)

"DistributionSmoothing"  (2)

Train a classifier using the "DistributionSmoothing" suboption:

Use the "Titanic" training set to train a classifier with the default value of "DistributionSmoothing":

Train a second classifier using a large "DistributionSmoothing":

Compare the probabilities for examples from a test set:

"FeatureFraction"  (2)

Train a predictor on high-dimensional data using the "FeatureFraction" suboption:

In the "RandomForest" method, a balanced "FeatureFraction" prevents overfitting.

Use the "Titanic" training set to train two classifiers with different values of "FeatureFraction":

Compare the accuracy of these classifiers on both the test set and the training set:

"LeafSize"  (1)

Use the "Titanic" training set to train two classifiers with different values of "LeafSize":

Compare the size of the corresponding forests:

"TreeNumber"  (1)

Use the "Mushroom" training set to train two classifiers with different values of "TreeNumber":

Look at the training time of these classifiers: