Natural Language Processing

Natural language processing deals with understanding text and spoken words as a human would. It is a fundamental component of many human/machine interactions (vocal assistants, dictation software, voice-operated systems, ...), text processing, analysis (suggestions, keyword spotting, translation, ...) and much more. The Wolfram Language natural language processing functionality is a combination of rule-based and machine learning language models. It builds on top of advanced text mining and string manipulation capabilities and is integrated with a large visualization suite and extensive built-in linguistic data.

Text Acquisition

Import import text from files or the web

"Text", "PDF", "HTML", "CSV", pick out plaintext, table data, etc.

TextRecognize  ▪  ExampleData  ▪  WikipediaData

Text Mining

TextSearch search an index or directory, returning a list of documents

Find, FindList search files for records containing particular strings

StringTake  ▪  StringReplace  ▪  StringCases  ▪  RegularExpression  ▪  ...

Text Normalization »

RemoveDiacritics remove diacritics such as accents, umlauts, etc.

CharacterNormalize reduce or decompose characters to normal forms (e.g.

)

TextTranslation  ▪  Transliterate  ▪  DeleteStopwords  ▪  WordStem  ▪  ToLowerCase  ▪  ...

Tokenization

StringSplit split a string at spaces or other delimiters

StringCases find cases of string patterns

TextCases  ▪  TextSentences  ▪  TextWords  ▪  TextStructure

Feature Extraction

FeatureExtraction extract numerical features from text

NetModel pre-trained networks for text feature extraction

"GloVe"  ▪  "BERT"  ▪  "ELMo"  ▪  "GPT2"  ▪  ...

Content Extraction

FindTextualAnswer attempt to find answers to questions from text

TextContents, TextCases, TextPosition extract semantic elements in text

Text Classification

Classify classify strings based on training data or built-in classifiers

"Language"  ▪  "Profanity"  ▪  "Sentiment"  ▪  ...

LanguageIdentify identify what language a text is in

Text Clustering

FindClusters find clusters in string data

ClusteringTree  ▪  ClusteringComponents  ▪  ClusterClassify

Text Analysis »

WordCounts count of words or -grams

LetterCounts  ▪  CharacterCounts  ▪  WordFrequency

DictionaryLookup  ▪  WordData  ▪  PartOfSpeech

Text Visualization

WordCloud generate a word cloud from word frequencies or weights

Snippet extract a snippet of text

Style, Highlighted style text with color, font, size, background, etc.