The sequences seq_i can be lists of either tokens or strings.
Sequences seq_i are assumed to be unordered subsequences of an underlying infinite sequence.
In SequencePredict[…,seq,prop], properties are as given in SequencePredictorFunction[…]; they include:

	"NextElement"	most likely next element
	"NextElement"n	individually most likely next n elements
	"NextSequence"n	most likely next length-n sequence of elements
	"RandomNextElement"	random sample from the next-element distribution
	"RandomNextElement"n	random sample from the next-sequence distribution
	"Probabilities"	association of probabilities for all possible next elements
	"SequenceProbability"	probability for the predictor to generate the given sequence
	"SequenceLogProbability"	log probability for the predictor to generate the sequence
	"Properties"	list of all properties available

Examples of built-in sequence predictors include:

	"Chinese"	character-based Chinese-language text
	"English"	character-based English-language text
	"French"	character-based French-language text
	"German"	character-based German-language text
	"Portuguese"	character-based Portuguese-language text
	"Russian"	character-based Russian-language text
	"Spanish"	character-based Spanish-language text

The following options can be given:

FeatureExtractor	Automatic	how to preprocess sequences
Method	Automatic	which prediction algorithm to use
PerformanceGoal	Automatic	aspects of performance to try to optimize
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally

Typical settings for FeatureExtractor for strings include:
"SegmentedCharacters" string interpreted as a sequence of characters (default)

"SegmentedWords" string interpreted as a sequence of words
Possible settings for PerformanceGoal include:

	"Memory"	minimize storage requirements of the predictor
	"Quality"	maximize accuracy of the predictor
	"Speed"	maximize speed of the predictor
	"TrainingSpeed"	minimize time spent producing the predictor
	Automatic	automatic tradeoff among speed, accuracy and memory

PerformanceGoal{goal₁,goal₂,…} will automatically combine goal₁, goal₂, etc.
Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

Possible settings for Method include:
"Markov" Markov model
In SequencePredict[…,Method{"Markov","Order"order}], order corresponds to Markov process memory size.
In SequencePredict[…,"SequenceProbability"], some probability mass is kept for unknown elements.
In SequencePredict[training,{},prop], {} is interpreted as an empty list of sequences rather than an empty sequence.

Examples

open all close all

Basic Examples (1)

Train a sequence predictor on a set of sequences:

Wolfram Language code: sp = SequencePredict[{{[image], [image]}, {[image], [image], [image]}, {[image], [image], [image]}, {[image], [image]}}]

Predict the next element of a new sequence:

Wolfram Language code: sp[{[image], [image]}]

Obtain the probabilities of the next element given the sequence:

Wolfram Language code: sp[{[image], [image]}, "Probabilities"]

Obtain a random next element according to the preceding distribution:

Wolfram Language code: sp[{[image], [image]}, "RandomNextElement"]

Obtain multiple predictions at a time:

Wolfram Language code: sp[{{[image], [image]}, {[image], [image]}}]

Predict the most likely next element and reuse this intermediate guess to predict the following element:

Wolfram Language code: sp[{[image], [image]}, "NextElement" -> 2]

Predict the most likely following sequence:

Wolfram Language code: sp[{[image], [image]}, "NextSequence" -> 2]

Compare the probabilities for the preceding sequences:

Wolfram Language code: sp[{{[image], [image], [image], [image]}, {[image], [image], [image], [image]}}, "SequenceProbability"]

Scope (4)

Custom Sequence Predictors (3)

Train a sequence predictor on a list of strings:

Wolfram Language code:

sp = SequencePredict[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog", "what a lovely cat", "this is not a dog"}]

Predict the next character following a given string:

Wolfram Language code: sp["the ca"]

Predict the next four characters:

Wolfram Language code: sp["the ca", "NextElement" -> 4]

Obtain the probabilities for each character to follow the given string:

Wolfram Language code: sp["the ca", "Probabilities"]

Train a sequence predictor on the list of common English words, each word treated as a sequence of characters:

Wolfram Language code: sp = SequencePredict[WordList[]]

Predict the most likely next character from a given sequence:

Wolfram Language code: sp["ab"]

In the previous example, each word is considered as a subsequence of an infinite sequence. Use the character | to mark boundaries between words:

Wolfram Language code: markedWords = "|" <> # <> "|"& /@ WordList[];

Build a new sequence predictor aware of word boundaries:

Wolfram Language code: sp = SequencePredict[markedWords]

Generate the beginning of an English-like word:

Wolfram Language code: sp["|", "RandomNextElement" -> 4]

Load a book from ExampleData:

Wolfram Language code: dq = ExampleData[{"Text", "DonQuixoteIEnglish"}];

Train a sequence predictor on this book:

Wolfram Language code: spchar = SequencePredict[{dq}]

Sample a random string in the book style:

Wolfram Language code: spchar[{"thou "}, "RandomNextElement" -> 20]

Train another sequence predictor, interpreting strings as word sequences rather than character sequences:

Wolfram Language code: spword = SequencePredict[{dq}, FeatureExtractor -> "SegmentedWords"]

Complete the preceding string with 10 consecutive words (spaces and punctuation marks are considered as words):

Wolfram Language code: spword["I", "RandomNextElement" -> 10]

Built-in Sequence Predictors (1)

Download the "English" built-in sequence predictor:

Wolfram Language code: sp = SequencePredict["English"]

Obtain the log-probability of the given string:

Wolfram Language code: sp["the cat sat on the hat", "SequenceLogProbability"]

Options (5)

FeatureExtractor (2)

Preprocess the training text to predict on words rather than at the character level:

Wolfram Language code: sp = SequencePredict[{WikipediaData["computer"]}, FeatureExtractor -> "SegmentedWords"]

Complete the preceding string with 10 consecutive words (spaces and punctuation marks are considered as words):

Wolfram Language code: sp["computer ", "RandomNextElement" -> 10]

Preprocess the training text to lowercase to obtain a better statistic with higher letter counts:

Wolfram Language code: sp = SequencePredict[{WikipediaData["computer"]}, FeatureExtractor -> "LowerCasedText"]

PerformanceGoal (2)

Train a classifier with an emphasis on the resulting model memory footprint:

Wolfram Language code: sp = SequencePredict[{WikipediaData["usa"]}, PerformanceGoal -> "Memory"]

Wolfram Language code: ByteCount[sp]

Compare with the automatically generated model size:

Wolfram Language code: ByteCount@SequencePredict[{WikipediaData["usa"]}]

Tune the computation time and precision when exploring the full sequence probability space:

Wolfram Language code: sp = SequencePredict[{ExampleData[{"Text", "DonQuixoteIEnglish"}]}]

Favor fast and approximated exploration:

Wolfram Language code: {shortertime, quickprediction} = AbsoluteTiming[sp["This is", "NextSequence" -> 50, PerformanceGoal -> "Speed"]]

Favor more in-depth exploration taking longer computation time:

Wolfram Language code: {longertime, betterprediction} = AbsoluteTiming[sp["This is", "NextSequence" -> 50, PerformanceGoal -> "Quality"]]

Compare the results:

Wolfram Language code: sp[#, "SequenceProbability"]& /@ {quickprediction, betterprediction}

Method (1)

Specify a memory size of 3 for the Markov process trained on the training subsequences:

Wolfram Language code: sp = SequencePredict[{WikipediaData["computer"]}, Method -> {"Markov", "Order" -> 3}]

Possible Issues (1)

An empty list is parsed as the list with no sequences inside and will return an empty list:

Wolfram Language code:

sp = SequencePredict[{{[image], [image]}, {[image], [image], [image]}, {[image], [image], [image]}, {[image], [image]}}, {}]

To obtain the most likely next element completing an empty sequence, nest it in a second list for disambiguation:

Wolfram Language code:

sp = SequencePredict[{{[image], [image]}, {[image], [image], [image]}, {[image], [image], [image]}, {[image], [image]}}, {{}}]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

SequencePredict

Details and Options

Examples

Basic Examples (1)

Scope (4)

Custom Sequence Predictors (3)

Built-in Sequence Predictors (1)

Options (5)

FeatureExtractor (2)

PerformanceGoal (2)

Method (1)

Possible Issues (1)

Text

CMS

APA

BibTeX

BibLaTeX

	"SegmentedCharacters"	string interpreted as a sequence of characters (default)
	"SegmentedWords"	string interpreted as a sequence of words

SequencePredict

Details and Options

Examples

Basic Examples (1)

Scope (4)

Custom Sequence Predictors (3)

Built-in Sequence Predictors (1)

Options (5)

FeatureExtractor (2)

PerformanceGoal (2)

Method (1)

Possible Issues (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX