Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 1 | <html><body> |
| 2 | <style> |
| 3 | |
| 4 | body, h1, h2, h3, div, span, p, pre, a { |
| 5 | margin: 0; |
| 6 | padding: 0; |
| 7 | border: 0; |
| 8 | font-weight: inherit; |
| 9 | font-style: inherit; |
| 10 | font-size: 100%; |
| 11 | font-family: inherit; |
| 12 | vertical-align: baseline; |
| 13 | } |
| 14 | |
| 15 | body { |
| 16 | font-size: 13px; |
| 17 | padding: 1em; |
| 18 | } |
| 19 | |
| 20 | h1 { |
| 21 | font-size: 26px; |
| 22 | margin-bottom: 1em; |
| 23 | } |
| 24 | |
| 25 | h2 { |
| 26 | font-size: 24px; |
| 27 | margin-bottom: 1em; |
| 28 | } |
| 29 | |
| 30 | h3 { |
| 31 | font-size: 20px; |
| 32 | margin-bottom: 1em; |
| 33 | margin-top: 1em; |
| 34 | } |
| 35 | |
| 36 | pre, code { |
| 37 | line-height: 1.5; |
| 38 | font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace; |
| 39 | } |
| 40 | |
| 41 | pre { |
| 42 | margin-top: 0.5em; |
| 43 | } |
| 44 | |
| 45 | h1, h2, h3, p { |
| 46 | font-family: Arial, sans serif; |
| 47 | } |
| 48 | |
| 49 | h1, h2, h3 { |
| 50 | border-bottom: solid #CCC 1px; |
| 51 | } |
| 52 | |
| 53 | .toc_element { |
| 54 | margin-top: 0.5em; |
| 55 | } |
| 56 | |
| 57 | .firstline { |
| 58 | margin-left: 2 em; |
| 59 | } |
| 60 | |
| 61 | .method { |
| 62 | margin-top: 1em; |
| 63 | border: solid 1px #CCC; |
| 64 | padding: 1em; |
| 65 | background: #EEE; |
| 66 | } |
| 67 | |
| 68 | .details { |
| 69 | font-weight: bold; |
| 70 | font-size: 14px; |
| 71 | } |
| 72 | |
| 73 | </style> |
| 74 | |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 75 | <h1><a href="speech_v1.html">Cloud Speech-to-Text API</a> . <a href="speech_v1.speech.html">speech</a></h1> |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 76 | <h2>Instance Methods</h2> |
| 77 | <p class="toc_element"> |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 78 | <code><a href="#longrunningrecognize">longrunningrecognize(body=None, x__xgafv=None)</a></code></p> |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 79 | <p class="firstline">Performs asynchronous speech recognition: receive results via the</p> |
| 80 | <p class="toc_element"> |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 81 | <code><a href="#recognize">recognize(body=None, x__xgafv=None)</a></code></p> |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 82 | <p class="firstline">Performs synchronous speech recognition: receive results after all audio</p> |
| 83 | <h3>Method Details</h3> |
| 84 | <div class="method"> |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 85 | <code class="details" id="longrunningrecognize">longrunningrecognize(body=None, x__xgafv=None)</code> |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 86 | <pre>Performs asynchronous speech recognition: receive results via the |
| 87 | google.longrunning.Operations interface. Returns either an |
| 88 | `Operation.error` or an `Operation.response` which contains |
| 89 | a `LongRunningRecognizeResponse` message. |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 90 | For more information on asynchronous speech recognition, see the |
| 91 | [how-to](https://cloud.google.com/speech-to-text/docs/async-recognize). |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 92 | |
| 93 | Args: |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 94 | body: object, The request body. |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 95 | The object takes the form of: |
| 96 | |
| 97 | { # The top-level message sent by the client for the `LongRunningRecognize` |
| 98 | # method. |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 99 | "audio": { # Contains audio data in the encoding specified in the `RecognitionConfig`. # Required. The audio data to be recognized. |
| 100 | # Either `content` or `uri` must be supplied. Supplying both or neither |
| 101 | # returns google.rpc.Code.INVALID_ARGUMENT. See |
| 102 | # [content limits](https://cloud.google.com/speech-to-text/quotas#content). |
| 103 | "uri": "A String", # URI that points to a file that contains audio data bytes as specified in |
| 104 | # `RecognitionConfig`. The file must not be compressed (for example, gzip). |
| 105 | # Currently, only Google Cloud Storage URIs are |
| 106 | # supported, which must be specified in the following format: |
| 107 | # `gs://bucket_name/object_name` (other URI formats return |
| 108 | # google.rpc.Code.INVALID_ARGUMENT). For more information, see |
| 109 | # [Request URIs](https://cloud.google.com/storage/docs/reference-uris). |
| 110 | "content": "A String", # The audio data bytes encoded as specified in |
| 111 | # `RecognitionConfig`. Note: as with all bytes fields, proto buffers use a |
| 112 | # pure binary representation, whereas JSON representations use base64. |
| 113 | }, |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 114 | "config": { # Provides information to the recognizer that specifies how to process the # Required. Provides information to the recognizer that specifies how to |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 115 | # process the request. |
| 116 | # request. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 117 | "speechContexts": [ # Array of SpeechContext. |
| 118 | # A means to provide context to assist the speech recognition. For more |
| 119 | # information, see |
| 120 | # [speech |
| 121 | # adaptation](https://cloud.google.com/speech-to-text/docs/context-strength). |
| 122 | { # Provides "hints" to the speech recognizer to favor specific words and phrases |
| 123 | # in the results. |
| 124 | "phrases": [ # A list of strings containing words and phrases "hints" so that |
| 125 | # the speech recognition is more likely to recognize them. This can be used |
| 126 | # to improve the accuracy for specific words and phrases, for example, if |
| 127 | # specific commands are typically spoken by the user. This can also be used |
| 128 | # to add additional words to the vocabulary of the recognizer. See |
| 129 | # [usage limits](https://cloud.google.com/speech-to-text/quotas#content). |
| 130 | # |
| 131 | # List items can also be set to classes for groups of words that represent |
| 132 | # common concepts that occur in natural language. For example, rather than |
| 133 | # providing phrase hints for every month of the year, using the $MONTH class |
| 134 | # improves the likelihood of correctly transcribing audio that includes |
| 135 | # months. |
| 136 | "A String", |
| 137 | ], |
| 138 | }, |
| 139 | ], |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 140 | "languageCode": "A String", # Required. The language of the supplied audio as a |
| 141 | # [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. |
| 142 | # Example: "en-US". |
| 143 | # See [Language |
| 144 | # Support](https://cloud.google.com/speech-to-text/docs/languages) for a list |
| 145 | # of the currently supported language codes. |
| 146 | "useEnhanced": True or False, # Set to true to use an enhanced model for speech recognition. |
| 147 | # If `use_enhanced` is set to true and the `model` field is not set, then |
| 148 | # an appropriate enhanced model is chosen if an enhanced model exists for |
| 149 | # the audio. |
| 150 | # |
| 151 | # If `use_enhanced` is true and an enhanced version of the specified model |
| 152 | # does not exist, then the speech is recognized using the standard version |
| 153 | # of the specified model. |
| 154 | "enableWordTimeOffsets": True or False, # If `true`, the top result includes a list of words and |
| 155 | # the start and end time offsets (timestamps) for those words. If |
| 156 | # `false`, no word-level time offset information is returned. The default is |
| 157 | # `false`. |
| 158 | "diarizationConfig": { # Config to enable speaker diarization. # Config to enable speaker diarization and set additional |
| 159 | # parameters to make diarization better suited for your application. |
| 160 | # Note: When this is enabled, we send all the words from the beginning of the |
| 161 | # audio for the top alternative in every consecutive STREAMING responses. |
| 162 | # This is done in order to improve our speaker tags as our models learn to |
| 163 | # identify the speakers in the conversation over time. |
| 164 | # For non-streaming requests, the diarization results will be provided only |
| 165 | # in the top alternative of the FINAL SpeechRecognitionResult. |
| 166 | "speakerTag": 42, # Output only. Unused. |
| 167 | "minSpeakerCount": 42, # Minimum number of speakers in the conversation. This range gives you more |
| 168 | # flexibility by allowing the system to automatically determine the correct |
| 169 | # number of speakers. If not set, the default value is 2. |
| 170 | "enableSpeakerDiarization": True or False, # If 'true', enables speaker detection for each recognized word in |
| 171 | # the top alternative of the recognition result using a speaker_tag provided |
| 172 | # in the WordInfo. |
| 173 | "maxSpeakerCount": 42, # Maximum number of speakers in the conversation. This range gives you more |
| 174 | # flexibility by allowing the system to automatically determine the correct |
| 175 | # number of speakers. If not set, the default value is 6. |
| 176 | }, |
| 177 | "profanityFilter": True or False, # If set to `true`, the server will attempt to filter out |
| 178 | # profanities, replacing all but the initial character in each filtered word |
| 179 | # with asterisks, e.g. "f***". If set to `false` or omitted, profanities |
| 180 | # won't be filtered out. |
| 181 | "enableAutomaticPunctuation": True or False, # If 'true', adds punctuation to recognition result hypotheses. |
| 182 | # This feature is only available in select languages. Setting this for |
| 183 | # requests in other languages has no effect at all. |
| 184 | # The default 'false' value does not add punctuation to result hypotheses. |
| 185 | "enableSeparateRecognitionPerChannel": True or False, # This needs to be set to `true` explicitly and `audio_channel_count` > 1 |
| 186 | # to get each channel recognized separately. The recognition result will |
| 187 | # contain a `channel_tag` field to state which channel that result belongs |
| 188 | # to. If this is not true, we will only recognize the first channel. The |
| 189 | # request is billed cumulatively for all channels recognized: |
| 190 | # `audio_channel_count` multiplied by the length of the audio. |
| 191 | "sampleRateHertz": 42, # Sample rate in Hertz of the audio data sent in all |
| 192 | # `RecognitionAudio` messages. Valid values are: 8000-48000. |
| 193 | # 16000 is optimal. For best results, set the sampling rate of the audio |
| 194 | # source to 16000 Hz. If that's not possible, use the native sample rate of |
| 195 | # the audio source (instead of re-sampling). |
| 196 | # This field is optional for FLAC and WAV audio files, but is |
| 197 | # required for all other audio formats. For details, see AudioEncoding. |
| 198 | "metadata": { # Description of audio data to be recognized. # Metadata regarding this request. |
| 199 | "originalMediaType": "A String", # The original media the speech was recorded on. |
| 200 | "recordingDeviceName": "A String", # The device used to make the recording. Examples 'Nexus 5X' or |
| 201 | # 'Polycom SoundStation IP 6000' or 'POTS' or 'VoIP' or |
| 202 | # 'Cardioid Microphone'. |
| 203 | "industryNaicsCodeOfAudio": 42, # The industry vertical to which this speech recognition request most |
| 204 | # closely applies. This is most indicative of the topics contained |
| 205 | # in the audio. Use the 6-digit NAICS code to identify the industry |
| 206 | # vertical - see https://www.naics.com/search/. |
| 207 | "originalMimeType": "A String", # Mime type of the original audio file. For example `audio/m4a`, |
| 208 | # `audio/x-alaw-basic`, `audio/mp3`, `audio/3gpp`. |
| 209 | # A list of possible audio mime types is maintained at |
| 210 | # http://www.iana.org/assignments/media-types/media-types.xhtml#audio |
| 211 | "audioTopic": "A String", # Description of the content. Eg. "Recordings of federal supreme court |
| 212 | # hearings from 2012". |
| 213 | "recordingDeviceType": "A String", # The type of device the speech was recorded with. |
| 214 | "microphoneDistance": "A String", # The audio type that most closely describes the audio being recognized. |
| 215 | "interactionType": "A String", # The use case most closely describing the audio content to be recognized. |
| 216 | }, |
| 217 | "maxAlternatives": 42, # Maximum number of recognition hypotheses to be returned. |
| 218 | # Specifically, the maximum number of `SpeechRecognitionAlternative` messages |
| 219 | # within each `SpeechRecognitionResult`. |
| 220 | # The server may return fewer than `max_alternatives`. |
| 221 | # Valid values are `0`-`30`. A value of `0` or `1` will return a maximum of |
| 222 | # one. If omitted, will return a maximum of one. |
| 223 | "audioChannelCount": 42, # The number of channels in the input audio data. |
| 224 | # ONLY set this for MULTI-CHANNEL recognition. |
| 225 | # Valid values for LINEAR16 and FLAC are `1`-`8`. |
| 226 | # Valid values for OGG_OPUS are '1'-'254'. |
| 227 | # Valid value for MULAW, AMR, AMR_WB and SPEEX_WITH_HEADER_BYTE is only `1`. |
| 228 | # If `0` or omitted, defaults to one channel (mono). |
| 229 | # Note: We only recognize the first channel by default. |
| 230 | # To perform independent recognition on each channel set |
| 231 | # `enable_separate_recognition_per_channel` to 'true'. |
Bu Sun Kim | 4ed7d3f | 2020-05-27 12:20:54 -0700 | [diff] [blame] | 232 | "encoding": "A String", # Encoding of audio data sent in all `RecognitionAudio` messages. |
| 233 | # This field is optional for `FLAC` and `WAV` audio files and required |
| 234 | # for all other audio formats. For details, see AudioEncoding. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 235 | "model": "A String", # Which model to select for the given request. Select the model |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 236 | # best suited to your domain to get best results. If a model is not |
| 237 | # explicitly specified, then we auto-select a model based on the parameters |
| 238 | # in the RecognitionConfig. |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 239 | # <table> |
| 240 | # <tr> |
| 241 | # <td><b>Model</b></td> |
| 242 | # <td><b>Description</b></td> |
| 243 | # </tr> |
| 244 | # <tr> |
| 245 | # <td><code>command_and_search</code></td> |
| 246 | # <td>Best for short queries such as voice commands or voice search.</td> |
| 247 | # </tr> |
| 248 | # <tr> |
| 249 | # <td><code>phone_call</code></td> |
| 250 | # <td>Best for audio that originated from a phone call (typically |
| 251 | # recorded at an 8khz sampling rate).</td> |
| 252 | # </tr> |
| 253 | # <tr> |
| 254 | # <td><code>video</code></td> |
| 255 | # <td>Best for audio that originated from from video or includes multiple |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 256 | # speakers. Ideally the audio is recorded at a 16khz or greater |
| 257 | # sampling rate. This is a premium model that costs more than the |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 258 | # standard rate.</td> |
| 259 | # </tr> |
| 260 | # <tr> |
| 261 | # <td><code>default</code></td> |
| 262 | # <td>Best for audio that is not one of the specific audio models. |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 263 | # For example, long-form audio. Ideally the audio is high-fidelity, |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 264 | # recorded at a 16khz or greater sampling rate.</td> |
| 265 | # </tr> |
| 266 | # </table> |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 267 | }, |
| 268 | } |
| 269 | |
| 270 | x__xgafv: string, V1 error format. |
| 271 | Allowed values |
| 272 | 1 - v1 error format |
| 273 | 2 - v2 error format |
| 274 | |
| 275 | Returns: |
| 276 | An object of the form: |
| 277 | |
| 278 | { # This resource represents a long-running operation that is the result of a |
| 279 | # network API call. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 280 | "response": { # The normal response of the operation in case of success. If the original |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 281 | # method returns no data on success, such as `Delete`, the response is |
| 282 | # `google.protobuf.Empty`. If the original method is standard |
| 283 | # `Get`/`Create`/`Update`, the response should be the resource. For other |
| 284 | # methods, the response should have the type `XxxResponse`, where `Xxx` |
| 285 | # is the original method name. For example, if the original method name |
| 286 | # is `TakeSnapshot()`, the inferred response type is |
| 287 | # `TakeSnapshotResponse`. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 288 | "a_key": "", # Properties of the object. Contains field @type with type URL. |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 289 | }, |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 290 | "metadata": { # Service-specific metadata associated with the operation. It typically |
| 291 | # contains progress information and common metadata such as create time. |
| 292 | # Some services might not provide such metadata. Any method that returns a |
| 293 | # long-running operation should document the metadata type, if any. |
| 294 | "a_key": "", # Properties of the object. Contains field @type with type URL. |
| 295 | }, |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 296 | "name": "A String", # The server-assigned name, which is only unique within the same service that |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 297 | # originally returns it. If you use the default HTTP mapping, the |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 298 | # `name` should be a resource name ending with `operations/{unique_id}`. |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 299 | "done": True or False, # If the value is `false`, it means the operation is still in progress. |
| 300 | # If `true`, the operation is completed, and either `error` or `response` is |
| 301 | # available. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 302 | "error": { # The `Status` type defines a logical error model that is suitable for # The error result of the operation in case of failure or cancellation. |
| 303 | # different programming environments, including REST APIs and RPC APIs. It is |
| 304 | # used by [gRPC](https://github.com/grpc). Each `Status` message contains |
| 305 | # three pieces of data: error code, error message, and error details. |
| 306 | # |
| 307 | # You can find out more about this error model and how to work with it in the |
| 308 | # [API Design Guide](https://cloud.google.com/apis/design/errors). |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 309 | "message": "A String", # A developer-facing error message, which should be in English. Any |
| 310 | # user-facing error message should be localized and sent in the |
| 311 | # google.rpc.Status.details field, or localized by the client. |
| 312 | "code": 42, # The status code, which should be an enum value of google.rpc.Code. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 313 | "details": [ # A list of messages that carry the error details. There is a common set of |
| 314 | # message types for APIs to use. |
| 315 | { |
| 316 | "a_key": "", # Properties of the object. Contains field @type with type URL. |
| 317 | }, |
| 318 | ], |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 319 | }, |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 320 | }</pre> |
| 321 | </div> |
| 322 | |
| 323 | <div class="method"> |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 324 | <code class="details" id="recognize">recognize(body=None, x__xgafv=None)</code> |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 325 | <pre>Performs synchronous speech recognition: receive results after all audio |
| 326 | has been sent and processed. |
| 327 | |
| 328 | Args: |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 329 | body: object, The request body. |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 330 | The object takes the form of: |
| 331 | |
| 332 | { # The top-level message sent by the client for the `Recognize` method. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 333 | "config": { # Provides information to the recognizer that specifies how to process the # Required. Provides information to the recognizer that specifies how to |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 334 | # process the request. |
| 335 | # request. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 336 | "speechContexts": [ # Array of SpeechContext. |
| 337 | # A means to provide context to assist the speech recognition. For more |
| 338 | # information, see |
| 339 | # [speech |
| 340 | # adaptation](https://cloud.google.com/speech-to-text/docs/context-strength). |
| 341 | { # Provides "hints" to the speech recognizer to favor specific words and phrases |
| 342 | # in the results. |
| 343 | "phrases": [ # A list of strings containing words and phrases "hints" so that |
| 344 | # the speech recognition is more likely to recognize them. This can be used |
| 345 | # to improve the accuracy for specific words and phrases, for example, if |
| 346 | # specific commands are typically spoken by the user. This can also be used |
| 347 | # to add additional words to the vocabulary of the recognizer. See |
| 348 | # [usage limits](https://cloud.google.com/speech-to-text/quotas#content). |
| 349 | # |
| 350 | # List items can also be set to classes for groups of words that represent |
| 351 | # common concepts that occur in natural language. For example, rather than |
| 352 | # providing phrase hints for every month of the year, using the $MONTH class |
| 353 | # improves the likelihood of correctly transcribing audio that includes |
| 354 | # months. |
| 355 | "A String", |
| 356 | ], |
| 357 | }, |
| 358 | ], |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 359 | "languageCode": "A String", # Required. The language of the supplied audio as a |
| 360 | # [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. |
| 361 | # Example: "en-US". |
| 362 | # See [Language |
| 363 | # Support](https://cloud.google.com/speech-to-text/docs/languages) for a list |
| 364 | # of the currently supported language codes. |
| 365 | "useEnhanced": True or False, # Set to true to use an enhanced model for speech recognition. |
| 366 | # If `use_enhanced` is set to true and the `model` field is not set, then |
| 367 | # an appropriate enhanced model is chosen if an enhanced model exists for |
| 368 | # the audio. |
| 369 | # |
| 370 | # If `use_enhanced` is true and an enhanced version of the specified model |
| 371 | # does not exist, then the speech is recognized using the standard version |
| 372 | # of the specified model. |
| 373 | "enableWordTimeOffsets": True or False, # If `true`, the top result includes a list of words and |
| 374 | # the start and end time offsets (timestamps) for those words. If |
| 375 | # `false`, no word-level time offset information is returned. The default is |
| 376 | # `false`. |
| 377 | "diarizationConfig": { # Config to enable speaker diarization. # Config to enable speaker diarization and set additional |
| 378 | # parameters to make diarization better suited for your application. |
| 379 | # Note: When this is enabled, we send all the words from the beginning of the |
| 380 | # audio for the top alternative in every consecutive STREAMING responses. |
| 381 | # This is done in order to improve our speaker tags as our models learn to |
| 382 | # identify the speakers in the conversation over time. |
| 383 | # For non-streaming requests, the diarization results will be provided only |
| 384 | # in the top alternative of the FINAL SpeechRecognitionResult. |
| 385 | "speakerTag": 42, # Output only. Unused. |
| 386 | "minSpeakerCount": 42, # Minimum number of speakers in the conversation. This range gives you more |
| 387 | # flexibility by allowing the system to automatically determine the correct |
| 388 | # number of speakers. If not set, the default value is 2. |
| 389 | "enableSpeakerDiarization": True or False, # If 'true', enables speaker detection for each recognized word in |
| 390 | # the top alternative of the recognition result using a speaker_tag provided |
| 391 | # in the WordInfo. |
| 392 | "maxSpeakerCount": 42, # Maximum number of speakers in the conversation. This range gives you more |
| 393 | # flexibility by allowing the system to automatically determine the correct |
| 394 | # number of speakers. If not set, the default value is 6. |
| 395 | }, |
| 396 | "profanityFilter": True or False, # If set to `true`, the server will attempt to filter out |
| 397 | # profanities, replacing all but the initial character in each filtered word |
| 398 | # with asterisks, e.g. "f***". If set to `false` or omitted, profanities |
| 399 | # won't be filtered out. |
| 400 | "enableAutomaticPunctuation": True or False, # If 'true', adds punctuation to recognition result hypotheses. |
| 401 | # This feature is only available in select languages. Setting this for |
| 402 | # requests in other languages has no effect at all. |
| 403 | # The default 'false' value does not add punctuation to result hypotheses. |
| 404 | "enableSeparateRecognitionPerChannel": True or False, # This needs to be set to `true` explicitly and `audio_channel_count` > 1 |
| 405 | # to get each channel recognized separately. The recognition result will |
| 406 | # contain a `channel_tag` field to state which channel that result belongs |
| 407 | # to. If this is not true, we will only recognize the first channel. The |
| 408 | # request is billed cumulatively for all channels recognized: |
| 409 | # `audio_channel_count` multiplied by the length of the audio. |
| 410 | "sampleRateHertz": 42, # Sample rate in Hertz of the audio data sent in all |
| 411 | # `RecognitionAudio` messages. Valid values are: 8000-48000. |
| 412 | # 16000 is optimal. For best results, set the sampling rate of the audio |
| 413 | # source to 16000 Hz. If that's not possible, use the native sample rate of |
| 414 | # the audio source (instead of re-sampling). |
| 415 | # This field is optional for FLAC and WAV audio files, but is |
| 416 | # required for all other audio formats. For details, see AudioEncoding. |
| 417 | "metadata": { # Description of audio data to be recognized. # Metadata regarding this request. |
| 418 | "originalMediaType": "A String", # The original media the speech was recorded on. |
| 419 | "recordingDeviceName": "A String", # The device used to make the recording. Examples 'Nexus 5X' or |
| 420 | # 'Polycom SoundStation IP 6000' or 'POTS' or 'VoIP' or |
| 421 | # 'Cardioid Microphone'. |
| 422 | "industryNaicsCodeOfAudio": 42, # The industry vertical to which this speech recognition request most |
| 423 | # closely applies. This is most indicative of the topics contained |
| 424 | # in the audio. Use the 6-digit NAICS code to identify the industry |
| 425 | # vertical - see https://www.naics.com/search/. |
| 426 | "originalMimeType": "A String", # Mime type of the original audio file. For example `audio/m4a`, |
| 427 | # `audio/x-alaw-basic`, `audio/mp3`, `audio/3gpp`. |
| 428 | # A list of possible audio mime types is maintained at |
| 429 | # http://www.iana.org/assignments/media-types/media-types.xhtml#audio |
| 430 | "audioTopic": "A String", # Description of the content. Eg. "Recordings of federal supreme court |
| 431 | # hearings from 2012". |
| 432 | "recordingDeviceType": "A String", # The type of device the speech was recorded with. |
| 433 | "microphoneDistance": "A String", # The audio type that most closely describes the audio being recognized. |
| 434 | "interactionType": "A String", # The use case most closely describing the audio content to be recognized. |
| 435 | }, |
| 436 | "maxAlternatives": 42, # Maximum number of recognition hypotheses to be returned. |
| 437 | # Specifically, the maximum number of `SpeechRecognitionAlternative` messages |
| 438 | # within each `SpeechRecognitionResult`. |
| 439 | # The server may return fewer than `max_alternatives`. |
| 440 | # Valid values are `0`-`30`. A value of `0` or `1` will return a maximum of |
| 441 | # one. If omitted, will return a maximum of one. |
| 442 | "audioChannelCount": 42, # The number of channels in the input audio data. |
| 443 | # ONLY set this for MULTI-CHANNEL recognition. |
| 444 | # Valid values for LINEAR16 and FLAC are `1`-`8`. |
| 445 | # Valid values for OGG_OPUS are '1'-'254'. |
| 446 | # Valid value for MULAW, AMR, AMR_WB and SPEEX_WITH_HEADER_BYTE is only `1`. |
| 447 | # If `0` or omitted, defaults to one channel (mono). |
| 448 | # Note: We only recognize the first channel by default. |
| 449 | # To perform independent recognition on each channel set |
| 450 | # `enable_separate_recognition_per_channel` to 'true'. |
Bu Sun Kim | 4ed7d3f | 2020-05-27 12:20:54 -0700 | [diff] [blame] | 451 | "encoding": "A String", # Encoding of audio data sent in all `RecognitionAudio` messages. |
| 452 | # This field is optional for `FLAC` and `WAV` audio files and required |
| 453 | # for all other audio formats. For details, see AudioEncoding. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 454 | "model": "A String", # Which model to select for the given request. Select the model |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 455 | # best suited to your domain to get best results. If a model is not |
| 456 | # explicitly specified, then we auto-select a model based on the parameters |
| 457 | # in the RecognitionConfig. |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 458 | # <table> |
| 459 | # <tr> |
| 460 | # <td><b>Model</b></td> |
| 461 | # <td><b>Description</b></td> |
| 462 | # </tr> |
| 463 | # <tr> |
| 464 | # <td><code>command_and_search</code></td> |
| 465 | # <td>Best for short queries such as voice commands or voice search.</td> |
| 466 | # </tr> |
| 467 | # <tr> |
| 468 | # <td><code>phone_call</code></td> |
| 469 | # <td>Best for audio that originated from a phone call (typically |
| 470 | # recorded at an 8khz sampling rate).</td> |
| 471 | # </tr> |
| 472 | # <tr> |
| 473 | # <td><code>video</code></td> |
| 474 | # <td>Best for audio that originated from from video or includes multiple |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 475 | # speakers. Ideally the audio is recorded at a 16khz or greater |
| 476 | # sampling rate. This is a premium model that costs more than the |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 477 | # standard rate.</td> |
| 478 | # </tr> |
| 479 | # <tr> |
| 480 | # <td><code>default</code></td> |
| 481 | # <td>Best for audio that is not one of the specific audio models. |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 482 | # For example, long-form audio. Ideally the audio is high-fidelity, |
Dan O'Meara | dd49464 | 2020-05-01 07:42:23 -0700 | [diff] [blame] | 483 | # recorded at a 16khz or greater sampling rate.</td> |
| 484 | # </tr> |
| 485 | # </table> |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 486 | }, |
| 487 | "audio": { # Contains audio data in the encoding specified in the `RecognitionConfig`. # Required. The audio data to be recognized. |
| 488 | # Either `content` or `uri` must be supplied. Supplying both or neither |
| 489 | # returns google.rpc.Code.INVALID_ARGUMENT. See |
| 490 | # [content limits](https://cloud.google.com/speech-to-text/quotas#content). |
| 491 | "uri": "A String", # URI that points to a file that contains audio data bytes as specified in |
| 492 | # `RecognitionConfig`. The file must not be compressed (for example, gzip). |
| 493 | # Currently, only Google Cloud Storage URIs are |
| 494 | # supported, which must be specified in the following format: |
| 495 | # `gs://bucket_name/object_name` (other URI formats return |
| 496 | # google.rpc.Code.INVALID_ARGUMENT). For more information, see |
| 497 | # [Request URIs](https://cloud.google.com/storage/docs/reference-uris). |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 498 | "content": "A String", # The audio data bytes encoded as specified in |
| 499 | # `RecognitionConfig`. Note: as with all bytes fields, proto buffers use a |
| 500 | # pure binary representation, whereas JSON representations use base64. |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 501 | }, |
| 502 | } |
| 503 | |
| 504 | x__xgafv: string, V1 error format. |
| 505 | Allowed values |
| 506 | 1 - v1 error format |
| 507 | 2 - v2 error format |
| 508 | |
| 509 | Returns: |
| 510 | An object of the form: |
| 511 | |
| 512 | { # The only message returned to the client by the `Recognize` method. It |
| 513 | # contains the result as zero or more sequential `SpeechRecognitionResult` |
| 514 | # messages. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 515 | "results": [ # Sequential list of transcription results corresponding to |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 516 | # sequential portions of audio. |
| 517 | { # A speech recognition result corresponding to a portion of the audio. |
Bu Sun Kim | 4ed7d3f | 2020-05-27 12:20:54 -0700 | [diff] [blame] | 518 | "channelTag": 42, # For multi-channel audio, this is the channel number corresponding to the |
| 519 | # recognized result for the audio from that channel. |
| 520 | # For audio_channel_count = N, its output values can range from '1' to 'N'. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 521 | "alternatives": [ # May contain one or more recognition hypotheses (up to the |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 522 | # maximum specified in `max_alternatives`). |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 523 | # These alternatives are ordered in terms of accuracy, with the top (first) |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 524 | # alternative being the most probable, as ranked by the recognizer. |
| 525 | { # Alternative hypotheses (a.k.a. n-best list). |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 526 | "transcript": "A String", # Transcript text representing the words that the user spoke. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 527 | "confidence": 3.14, # The confidence estimate between 0.0 and 1.0. A higher number |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 528 | # indicates an estimated greater likelihood that the recognized words are |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 529 | # correct. This field is set only for the top alternative of a non-streaming |
| 530 | # result or, of a streaming result where `is_final=true`. |
| 531 | # This field is not guaranteed to be accurate and users should not rely on it |
| 532 | # to be always provided. |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 533 | # The default of 0.0 is a sentinel value indicating `confidence` was not set. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 534 | "words": [ # A list of word-specific information for each recognized word. |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 535 | # Note: When `enable_speaker_diarization` is true, you will see all the words |
| 536 | # from the beginning of the audio. |
| 537 | { # Word-specific information for recognized words. |
Bu Sun Kim | d059ad8 | 2020-07-22 17:02:09 -0700 | [diff] [blame^] | 538 | "speakerTag": 42, # Output only. A distinct integer value is assigned for every speaker within |
| 539 | # the audio. This field specifies which one of those speakers was detected to |
| 540 | # have spoken this word. Value ranges from '1' to diarization_speaker_count. |
| 541 | # speaker_tag is set if enable_speaker_diarization = 'true' and only in the |
| 542 | # top alternative. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 543 | "endTime": "A String", # Time offset relative to the beginning of the audio, |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 544 | # and corresponding to the end of the spoken word. |
| 545 | # This field is only set if `enable_word_time_offsets=true` and only |
| 546 | # in the top hypothesis. |
| 547 | # This is an experimental feature and the accuracy of the time offset can |
| 548 | # vary. |
Bu Sun Kim | 6502091 | 2020-05-20 12:08:20 -0700 | [diff] [blame] | 549 | "startTime": "A String", # Time offset relative to the beginning of the audio, |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 550 | # and corresponding to the start of the spoken word. |
| 551 | # This field is only set if `enable_word_time_offsets=true` and only |
| 552 | # in the top hypothesis. |
| 553 | # This is an experimental feature and the accuracy of the time offset can |
| 554 | # vary. |
Bu Sun Kim | 4ed7d3f | 2020-05-27 12:20:54 -0700 | [diff] [blame] | 555 | "word": "A String", # The word corresponding to this set of information. |
Bu Sun Kim | 715bd7f | 2019-06-14 16:50:42 -0700 | [diff] [blame] | 556 | }, |
| 557 | ], |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 558 | }, |
| 559 | ], |
Sai Cheemalapati | 4ba8c23 | 2017-06-06 18:46:08 -0400 | [diff] [blame] | 560 | }, |
| 561 | ], |
| 562 | }</pre> |
| 563 | </div> |
| 564 | |
| 565 | </body></html> |