Rate limits

Rate limits are a mechanism to help manage SambaNova API usage to provide stable performance and reliable service. They limit how many times each user can call the SambaNova API within a given interval. Rate limits are measured in:

RPM: Requests per minute
RPD: Requests per day

Basics

A request is defined by a call to our API
You can hit either limit type (RPM or RPD) depending on which one you reach first
You will be notified in every request response what the status of your rate limits are (see rate limit response headers for more information)
If you hit a rate limit, you will be sent an error message in your response (see API error codes)

SambaStack rate limits

For SambaStack deployments, rate limits are optional and applied to user groups by the administrator.

SambaCloud rate limit tiers

There are a few different rate limit tier offerings we provide:

Free Tier: Applied when there is no payment method linked with your account
Developer Tier: Applied when a payment method is linked with your account
Enterprise Tier: Please contact our sales team for our enterprise tier rate limit plans

Please see the Billing page to link a payment method to your account.

Below are our Developer Tier and Free Tier rate limits.

Production model rate limits

Production models are intended for use in production environments and meet our high standards for speed and quality.

Developer	Model ID	Requests per minute (RPM)	Requests per day (RPD)
DeepSeek
	`DeepSeek-R1`	60	12000
	`DeepSeek-R1-Distill-Llama-70B`	240	48000
	`DeepSeek-V3-0324`	60	12000
	`Deepseek-V3.1`	60	12000
Meta
	`Meta-Llama-3.3-70B-Instruct`	240	48000
	`Meta-Llama-3.1-8B-Instruct`	1440	288000

Preview model rate limits

Preview models are intended for evaluation purposes and developer experimentation only, and should not be used in production environments. These models have limited capacity and may be removed at short notice.

Developer	Model ID	Requests per minute (RPM)	Requests per day (RPD)
Meta
	`Llama-4-Maverick-17B-128E-Instruct`	60	12000
OpenAI
	`Whisper-Large-v3`	450	90000
Qwen
	`Qwen3-32B`	30	6000
Tokyotech-llm
	`Llama-3.3-Swallow-70B-Instruct-v0.4`	60	12000
Other
	`E5-Mistral-7B-Instruct`	60	12000

Rate limit response headers

These headers are found in every request response and give information about the current status of rate limit usage. RPM (Requests per minute):

x-ratelimit-limit-requests
- The maximum number of requests allowed per minute.
x-ratelimit-remaining-requests
- The number of requests remaining in the current minute before hitting the rate limit.
x-ratelimit-reset-requests
- Time in epoch time until the per-minute request quota resets.

RPD (Requests per day):

x-ratelimit-limit-requests-day
- The maximum number of requests allowed per day.
x-ratelimit-remaining-requests-day
- The number of requests remaining in the current day before hitting the rate limit.
x-ratelimit-reset-requests-day
- Time in epoch time until the per-day request quota resets.

Get started

Models

Features

Build

Resources

SambaStack rate limits

SambaCloud rate limit tiers

Production model rate limits

Preview model rate limits

Rate limit response headers

Get started

Models

Features

Build

Resources

​SambaStack rate limits

​SambaCloud rate limit tiers

​Production model rate limits

​Preview model rate limits

​Rate limit response headers

SambaStack rate limits

SambaCloud rate limit tiers

Production model rate limits

Preview model rate limits

Rate limit response headers