Whisper-Large-v3
- Model: Whisper-Large-v3
- Description: State-of-the-art automatic speech recognition (ASR) and translation model. Developed by OpenAI and trained on 5M+ hours of labeled audio. Excels in multilingual and zero-shot speech tasks across diverse domains.
- Model ID:
Whisper-Large-v3
- Supported languages: Multilingual
Core capabilities
- Transcribes and translates extended audio inputs (up to 25 MB).
- Demonstrates high accuracy in speech recognition and translation tasks.
- Provides OpenAI-compatible endpoints for transcriptions and translations.
Request parameters
Parameter | Type | Description | Default | Endpoints |
---|---|---|---|---|
model | String | The ID of the model to use. | Required | transcriptions , translations |
file | File | Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit: 25MB. | Required | transcriptions , translations |
prompt | String | Prompt to influence transcription style or vocabulary. Example: “Please transcribe carefully, including pauses and hesitations.” | Optional | transcriptions , translations |
response_format | String | Output format: either json or text . | json | transcriptions , translations |
language | String | The language of the input audio. Using ISO-639-1 format (e.g., en ) improves accuracy and latency. | Optional | transcriptions , translations |
stream | Boolean | Enables streaming responses. | false | transcriptions , translations |
stream_options | Object | Additional streaming configuration (e.g., {"include_usage": true} ). | Optional | transcriptions , translations |