gives an association whose keys are the distinct words identified in string, and whose values give the number of times those words appear in string.
gives counts of the distinct n-grams consisting of runs of n words in string.
Details and Options
- WordCounts[string,…] identifies words in string in the same way as TextWords.
- In WordCounts[string,n], words that are considered part of an n-gram must appear consecutively in string, not separated by nonword characters other than whitespace.
- WordCounts has the option IgnoreCase. With the setting IgnoreCase->True, letters are in effect all converted to lower case before being counted.
Examplesopen allclose all
Find the number of times the main characters Sherlock Holmes and John Watson are mentioned in some novels of Arthur Conan Doyle:
Retrieve Miguel Cervantes's novel Don Quixote from ExampleData to test the empirical Zipf law:
Generate the frequency table of all words in this text:
Zipf's law asserts that the frequency of a word versus its rank in the frequency table follows approximately a linear relation in a log-log scale. Test this statement on the first 1,000 most frequent words:
The result is close to . Visualize the fit together with the actual data:
Wolfram Research (2015), WordCounts, Wolfram Language function, https://reference.wolfram.com/language/ref/WordCounts.html.
Wolfram Language. 2015. "WordCounts." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/WordCounts.html.
Wolfram Language. (2015). WordCounts. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/WordCounts.html