Enables advanced audio analysis with optional text instructions.
Endpoint
POST https://api.sambanova.ai/v1/audio/reasoning
Request parameters
The following table outlines the parameters required to make a audio request, parameter type, description, and default values.
Parameter | Type | Description | Default |
---|
model | String | The ID of the model to use. Only Qwen2-Audio-7B-Instruct is currently available. | Required |
messages | Message | A list of messages containing role (user/system/assistant), type (text/audio_content), and audio_content (base64 audio content). | Required |
response_format | String | The output format, either “json” or “text”. | json |
temperature | Integer | Sampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused. | 0 |
max_tokens | Integer | The maximum number of tokens to generate. | 1000 |
file | File | Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. Each single file must not exceed 30 seconds in duration. | Required |
stream | Boolean | Enables streaming responses. | false |
stream_options | Object | Additional streaming configuration (e.g., {“include_usage”: true}). | Optional |
This section provides examples of how to send a request using different methods.
CURL
curl --location 'https://api.sambanova.ai/v1/audio/reasoning' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
"messages": [
{"role": "assistant", "content": "you are a helpful assistant"},
{"role": "user", "content":[
{
"type": "audio_content",
"audio_content": {
"content": "data:audio/mp3;base64,<base64_audio>"
}
}
]
},
{"role": "user", "content": "what is the audio about"}
],
"max_tokens": 1024,
"model": "Qwen2-Audio-7B-Instruct",
"temperature": 0.01,
"stream": true // Optional
}'
Python
import requests
import base64
def analyze_audio(audio_file_path, api_key):
with open(audio_file_path, "rb") as audio_file:
base64_audio = base64.b64encode(audio_file.read()).decode('utf-8')
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
data = {
"messages": [
{"role": "assistant", "content": "you are a helpful assistant"},
{"role": "user", "content": [
{
"type": "audio_content",
"audio_content": {
"content": f"data:audio/mp3;base64,{base64_audio}"
}
}
]},
{"role": "user", "content": "what is the audio about"}
],
"model": "Qwen2-Audio-7B-Instruct",
"max_tokens": 1024,
"temperature": 0.01,
"stream": True # Optional
}
response = requests.post(
"https://api.sambanova.ai/v1/audio/reasoning",
headers=headers,
json=data
)
return response.json()
The API returns a response in the selected format.
{
"choices": [{
"delta": {
"content": "The sound is that of ",
"role": "assistant"
},
"finish_reason": null,
"index": 0,
"logprobs": null
}],
"created": 1732317298,
"id": "211b9a22-58cf-4b90-94e9-1fed8d0d9d0a",
"model": "Qwen2-Audio-7B-Instruct",
}
Streaming responses
When streaming is enabled, the API returns a series of data chunks in the following format:
data: {"choices":[{"delta":{"content":"","role":"assistant"},"finish_reason":null,"index":0,"logprobs":null}],"created":1732317298,"id":"211b9a22-58cf-4b90-94e9-1fed8d0d9d0a","model":"Qwen2-Audio-7B-Instruct","object":"chat.completion.chunk","system_fingerprint":"fastcoe"}
data: {"choices":[{"delta":{"content":"The sound is that of ","role":"assistant"},"finish_reason":null,"index":0,"logprobs":null}],"created":1732317298,"id":"211b9a22-58cf-4b90-94e9-1fed8d0d9d0a","model":"Qwen2-Audio-7B-Instruct","object":"chat.completion.chunk","system_fingerprint":"fastcoe"}