SpeechRecognize
SpeechRecognize[audio]
recognizes speech in audio and returns it as a string.
Details and Options

- Speech recognition aims to convert a spoken audio signal to text. It is also known as speech-to-text and is typically used in voice-enabled human-machine interactions and digital personal assistants.
- SpeechRecognize[audio] returns all recognized speech in audio as a single string.
- The following options can be given:
-
Masking All interval of interest TargetDevice "CPU" the device on which to perform recognition - By default, speech in the whole signal is recognized. Use Masking->{int1,int2,…} to limit the recognition to intervals inti.
- SpeechRecognize only works for English speech.
- SpeechRecognize uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
- SpeechRecognize may download resources that will be stored in your local object store at $LocalBase, and can be listed using LocalObjects[] and removed using ResourceRemove.

Examples
open allclose allOptions (1)
Masking (1)
Use the Masking option to recognize parts of a signal:
Applications (4)
Use AudioIntervals to select which parts of the signal to recognize:
Show the recognized city on the map:
Wolfram Research (2019), SpeechRecognize, Wolfram Language function, https://reference.wolfram.com/language/ref/SpeechRecognize.html.
Text
Wolfram Research (2019), SpeechRecognize, Wolfram Language function, https://reference.wolfram.com/language/ref/SpeechRecognize.html.
CMS
Wolfram Language. 2019. "SpeechRecognize." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/SpeechRecognize.html.
APA
Wolfram Language. (2019). SpeechRecognize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SpeechRecognize.html