---
title: "Markov"
language: "en"
type: "Method"
summary: "Markov (Machine Learning Method) Method for Classify. Model class probabilities using the n-gram frequencies of the given sequence. In a Markov model, at training time, an n-gram language model is computed for each class. At test time, the probability for each class is computed according to Bayes's theorem, P(class|sequence)\\[Proportional]P(class) P(sequence|class), where P(sequence|class) is given by the language model of the given class and P(class) is class prior. The following options can be given: When Order->n, the method partitions sequences in (n+1)-grams. When Order->0, the method uses unigrams (single tokens). The model can then be called a unigram model or naive Bayes model. The value of AdditiveSmoothing is added to all n-gram counts. It is used to regularize the language model."
canonical_url: "https://reference.wolfram.com/language/ref/method/Markov.html"
source: "Wolfram Language Documentation"
---
# "Markov" (Machine Learning Method)

* Method for ``Classify``.

* Model class probabilities using the ``n``-gram frequencies of the given sequence.

---

## Details & Suboptions

* In a Markov model, at training time, an ``n``-gram language model is computed for each class. At test time, the probability for each class is computed according to Bayes's theorem, $P(\text{class}|\text{sequence})\propto P(\text{class}) P(\text{sequence}|\text{class})$, where  $P(\text{sequence}|\text{class})$ is given by the language model of the given class and $P(\text{class})$ is class prior.

* The following options can be given:

|                      |           |                                                 |
| -------------------- | --------- | ----------------------------------------------- |
| "AdditiveSmoothing"  | .1        | the smoothing parameter to use                  |
| "MinimumTokenCount"  | Automatic | minimum count for an n-gram to to be considered |
| "Order"              | Automatic | n-gram length                                   |

* When ``"Order" -> n``, the method partitions sequences in (``n``+1)-grams.

* When ``"Order" -> 0``, the method uses unigrams (single tokens). The model can then be called a unigram model or naive Bayes model.

* The value of ``"AdditiveSmoothing"`` is added to all ``n``-gram counts. It is used to regularize the language model.

---

## Examples (5)

### Basic Examples (1)

Train a classifier function on labeled examples:

```wl
In[1]:= c = Classify[{{"flour", "butter"}, {"pasta", "tomato"}, {"apple", "ice cream"}, {"salt", "meat"}, {"honey", "sugar", "butter"}} -> {"dessert", "main course", "dessert", "main course", "dessert"}, Method -> "Markov"]

Out[1]=
ClassifierFunction[Association["ExampleNumber" -> 5, "ClassNumber" -> 2, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "NominalSequence ...  "Date" -> DateObject[{2025, 6, 25, 1, 11, 0.2386149`7.130272548300395}, "Instant", "Gregorian", 
      -5.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", "OperatingSystem" -> "Windows", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Obtain information about the classifier:

```wl
In[2]:= Information[c]

Out[2]=
MachineLearning`MLInformationObject[ClassifierFunction[Association["ExampleNumber" -> 5, 
   "ClassNumber" -> 2, "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor[
       "ToMLDataset", Association["Input" -> Association[
        ... ate" -> DateObject[{2025, 6, 25, 1, 11, 0.2386149`7.130272548300395}, "Instant", 
       "Gregorian", -5.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", 
     "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, "Evaluations" -> {}]]]]
```

Classify a new example:

```wl
In[3]:= c[{"tomato", "meat", "salt"}]

Out[3]= "main course"
```

### Options (4)

#### "AdditiveSmoothing" (2)

Train a classifier using the ``"AdditiveSmoothing"`` suboption:

```wl
In[1]:= Classify[{{[image], "this is dark red"} -> Red, {[image], "this is blue"} -> Blue, {[image], "this is dark blue"} -> Blue, {[image], "this red is almost orange"} -> Red}, Method -> {"Markov", "AdditiveSmoothing" -> 2}]

Out[1]= ClassifierFunction[…]
```

---

Train two classifiers on an imbalanced dataset by varying the value of ``"AdditiveSmoothing"`` :

```wl
In[1]:= data = {"aaabb" -> True, "cdfff" -> True, "cdffvvv" -> True, "aaa" -> True, "vvvtul" -> False, "dqedewf" -> True};

In[2]:=
c1 = Classify[data, Method -> {"Markov", "AdditiveSmoothing" -> .1}];
c2 = Classify[data, Method -> {"Markov", "AdditiveSmoothing" -> 7}];
```

Look at the corresponding probabilities for the imbalanced element:

```wl
In[3]:= c1["vvvtul", "Probabilities"]

Out[3]= <|False -> 0.999999, True -> 6.243717669398518`*^-7|>

In[4]:= c2["vvvtul", "Probabilities"]

Out[4]= <|False -> 0.636523, True -> 0.363477|>
```

#### "Order" (2)

Train a classifier by specifying the ``"Order"`` :

```wl
In[1]:= Classify[{{[image], "this is dark red"} -> Red, {[image], "this is blue"} -> Blue, {[image], "this is dark blue"} -> Blue, {[image], "this red is almost orange"} -> Red}, Method -> {"Markov", "Order" -> 2}]

Out[1]= ClassifierFunction[…]
```

---

Generate a dataset of real words and random strings:

```wl
In[1]:=
alphabet = Alphabet[];
randomstrings = RandomChoice[alphabet, RandomInteger[{2, 5}]];
realwords = DictionaryLookup["a" ~~ ___];
trainingset = <|"RealWord" -> RandomSample[realwords, 200], 
	"RandomString" -> Table[StringJoin@@randomstrings, 200]|>;
```

Generate classifiers using different values for the ``"Order"`` :

```wl
In[2]:= c0 = Classify[trainingset, Method -> {"Markov" , "Order" -> 0}]

Out[2]=
ClassifierFunction[Association["ExampleNumber" -> 400, "ClassNumber" -> 2, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
    ... -> DateObject[{2025, 6, 27, 13, 46, 
       42.0641526`9.376487018898942}, "Instant", "Gregorian", -5.], "ProcessorCount" -> 4, 
    "ProcessorType" -> "x86-64", "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]

In[3]:= c2 = Classify[trainingset, Method -> {"Markov", "Order" -> 2}]

Out[3]=
ClassifierFunction[Association["ExampleNumber" -> 400, "ClassNumber" -> 2, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
    ... -> DateObject[{2025, 6, 27, 13, 46, 
       42.3140917`9.379059897496155}, "Instant", "Gregorian", -5.], "ProcessorCount" -> 4, 
    "ProcessorType" -> "x86-64", "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Compare the probabilities of these classifiers on a new real word:

```wl
In[4]:=
SeedRandom[4]
notRandom = RandomSample[DictionaryLookup["a" ~~ ___], 1];
Dataset@<|"Classifier1" -> c0[notRandom, "Probabilities"], "Classifier2" -> c2[notRandom, "Probabilities"]|>

Out[4]=
RandomGeneratorState[{"ExtendedCA", {80, 4, 0}}, 
 {{RawArray["UnsignedInteger64", {18302116819084061373, 11746481009677376582, 7205648230778544085, 
     18400895792363088632, 11685717158017383283, 14200862264039299955, 2613688760522993787, 
      ... 3, 3514913594139996632, 
     10401935623725113395, 57122523875845255, 5247983425652520213, 5354304471328057820, 
     7321571919780935734}], 0, 20, 20}, {CompressedData["«1415»"], 4, 0}}, 
 RawArray["UnsignedInteger64", {6205802040732621421, 0}]]

Out[4]= DynamicModule[«6»]
```

## See Also

* [`Classify`](https://reference.wolfram.com/language/ref/Classify.en.md)
* [`ClassifierFunction`](https://reference.wolfram.com/language/ref/ClassifierFunction.en.md)
* [`ClassifierMeasurements`](https://reference.wolfram.com/language/ref/ClassifierMeasurements.en.md)
* [`Predict`](https://reference.wolfram.com/language/ref/Predict.en.md)
* [`SequencePredict`](https://reference.wolfram.com/language/ref/SequencePredict.en.md)
* [`ClusterClassify`](https://reference.wolfram.com/language/ref/ClusterClassify.en.md)
* [`DecisionTree`](https://reference.wolfram.com/language/ref/method/DecisionTree.en.md)
* [`LogisticRegression`](https://reference.wolfram.com/language/ref/method/LogisticRegression.en.md)
* [`NearestNeighbors`](https://reference.wolfram.com/language/ref/method/NearestNeighbors.en.md)
* [`NeuralNetwork`](https://reference.wolfram.com/language/ref/method/NeuralNetwork.en.md)
* [`RandomForest`](https://reference.wolfram.com/language/ref/method/RandomForest.en.md)
* [`SupportVectorMachine`](https://reference.wolfram.com/language/ref/method/SupportVectorMachine.en.md)

## Related Links

* [An Elementary Introduction to the Wolfram Language: Machine Learning](https://www.wolfram.com/language/elementary-introduction/22-machine-learning.html)

## History

* [Introduced in 2014 (10.0)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn100.en.md)