- 
    See Also
    - ServiceExecute
- ServiceConnect
- SpeechRecognize
- SpeechSynthesize
- VoiceStyleData
- 
      
 
- Service Connections
- OpenAI
 
- 
    
    - 
      See Also
      - ServiceExecute
- ServiceConnect
- SpeechRecognize
- SpeechSynthesize
- VoiceStyleData
- 
        
 
- Service Connections
- OpenAI
 
 
- 
      See Also
      
"GoogleSpeech" (Service Connection)

Connecting & Authenticating
Requests
Synthesize Audio from Text
"ListVoices" — returns a list of available voice styles
| Language | All | restrict the query to voices able to synthesize a given language | 
"Synthesize" — returns speech synthesized from text
| "Input" | (required) | text to synthesize | |
| "Voice" | Automatic | name of the synthesis voice | |
| Language | Automatic | language of the synthesis voice | |
| "Pitch" | Automatic | semitone deviation from the native voice pitch | |
| "Rate" | Automatic | factor by which to change the native voice speed | |
| AudioEncoding | Automatic | output audio encoding | |
| GeneratedAssetLocation | $GeneratedAssetLocation | storage location of the synthesized audio | |
| GeneratedAssetFormat | Automatic | output format of the synthesized audio | |
| "EffectsProfileID" | Automatic | post-processing effect name applied to speech | 
Recognize Text from Audio
"Recognize" — returns text transcribed from audio
| "Input" | (required) | audio to transcribe | |
| Language | "English" | language(s) of the contained speech | |
| "ChannelRecognition" | False | whether to transcribe each channel separately | |
| MaxItems | 1 | maximum number of hypotheses to return | |
| "ProfanityFilter" | False | whether to attempt to replace profanities | |
| "SpeechContexts" | {} | phrase hints to assist transcription | |
| "WordTimeOffsets" | True | return word time offsets with the result | |
| "WordConfidence" | False | return word confidence values with the result | |
| "Punctuation" | True | include punctuation in the transcription | |
| "SpokenPunctuation" | False | replace spoken punctuation with ASCII character | |
| "SpokenEmojis" | False | replace spoken emojis with Unicode character | |
| "SpeakerDiarization" | False | tag distinct speakers in the result | |
| "Model" | Automatic | specify a model to use for the request | |
| MetaInformation | None | metadata describing the input audio | 
Parameter Details
| strw | give weight w to the string str | |
| {str1w1,str2w2,…} | give weight wi to the string stri | 
| "large-automotive-class-device" | optimized for car speakers | |
| "small-bluetooth-speaker-class-device" | optimized for small home speakers | 
| "latest_long" | optimized for long-form content | |
| "latest_short" | optimized for short-form content | |
| "command_and_search" | optimized for short queries | 
Examples
open all close allBasic Examples (1)
Scope (5)
Speech Synthesis (3)
Synthesize text in a different language. Setting "Language" to Automatic will infer the language from the input text, or a particular language can be specified. The service will attempt to select a voice style with the requested language:
Speech Recognition (2)
Transcribe text from audio containing speech:
By default, everything from the API response is returned, including information about recognized words:
Return multiple guesses of the transcription:
Separate different speakers from a recording:
Specify the minimum and maximum number of speakers:
Display labeled words in a Dataset. The API currently returns speaker labels in the second result:
See Also
ServiceExecute ▪ ServiceConnect ▪ SpeechRecognize ▪ SpeechSynthesize ▪ VoiceStyleData
Service Connections: OpenAI