WOLFRAM

Import text from a webpage

Get the text from a webpage as a string:

This is the beginning of the imported text:

Out[2]=2

Find common words

Find the 10 most common nontrivial words on the webpage and the number of times they occur:

Out[3]=3

Make a word cloud

Make a word cloud of the text:

In[4]:=4
Out[4]=4

Notes

The text of Wikipedia pages can be easily extracted using WikipediaData, which automatically strips page contents that are not text:

In[5]:=5
In[6]:=6
Out[6]=6