"GoogleSpeech" (Service Connection)
Connecting & Authenticating
Requests
Synthesize Audio from Text
"ListVoices" — returns a list of available voice styles
Language | All | restrict the query to voices able to synthesize a given language |
"Synthesize" — returns speech synthesized from text
"Input" | (required) | text to synthesize | |
"Voice" | Automatic | name of the synthesis voice | |
Language | Automatic | language of the synthesis voice | |
"Pitch" | Automatic | semitone deviation from the native voice pitch | |
"Rate" | Automatic | factor by which to change the native voice speed | |
AudioEncoding | Automatic | output audio encoding | |
GeneratedAssetLocation | $GeneratedAssetLocation | storage location of the synthesized audio | |
GeneratedAssetFormat | Automatic | output format of the synthesized audio | |
"EffectsProfileID" | Automatic | post-processing effect name applied to speech |
Recognize Text from Audio
"Recognize" — returns text transcribed from audio
"Input" | (required) | audio to transcribe | |
Language | "English" | language(s) of the contained speech | |
"ChannelRecognition" | False | whether to transcribe each channel separately | |
MaxItems | 1 | maximum number of hypotheses to return | |
"ProfanityFilter" | False | whether to attempt to replace profanities | |
"SpeechContexts" | {} | phrase hints to assist transcription | |
"WordTimeOffsets" | True | return word time offsets with the result | |
"WordConfidence" | False | return word confidence values with the result | |
"Punctuation" | True | include punctuation in the transcription | |
"SpokenPunctuation" | False | replace spoken punctuation with ASCII character | |
"SpokenEmojis" | False | replace spoken emojis with Unicode character | |
"SpeakerDiarization" | False | tag distinct speakers in the result | |
"Model" | Automatic | specify a model to use for the request | |
MetaInformation | None | metadata describing the input audio |
Parameter Details
strw | give weight w to the string str | |
{str1w1,str2w2,…} | give weight wi to the string stri |
"large-automotive-class-device" | optimized for car speakers | |
"small-bluetooth-speaker-class-device" | optimized for small home speakers |
"latest_long" | optimized for long-form content | |
"latest_short" | optimized for short-form content | |
"command_and_search" | optimized for short queries |
Examples
open allclose allBasic Examples (1)Summary of the most common use cases
Connect to Google speech service:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-ejke52


https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-viji89


https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-r2i83i


https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-jmntee

Scope (2)Survey of the scope of standard use cases
Speech Synthesis (1)

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-jklk2v


https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-bm0mrx

Synthesize text in a different language. Setting "Language" to Automatic will infer the language from the input text, or a particular language can be specified. The service will attempt to select a voice style with the requested language:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-4g1a3c


https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-ollypw


https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-ea7k2b

Synthesize speech using a particular voice:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-2al53k


https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-wuptyi

Make the speech faster and lower in pitch:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-redyvf

Speech Recognition (1)
Transcribe text from audio containing speech:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-mza3tx


https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-jepo0b

By default, everything from the API response is returned, including information about recognized words:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-5fkeym

Return multiple guesses of the transcription:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-oi3np0

Separate different speakers from a recording:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-c1ozyl

Specify the minimum and maximum number of speakers:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-8v7dod

Display labeled words in a Dataset. The API currently returns speaker labels in the second result:

https://wolfram.com/xid/0e0r9g83x4afkq8p07klymetii-81efhk
