How does SambaNova's pricing work?

SambaNova is hybrid. SambaNova Cloud (SambaCloud) is a public, usage-based inference API billed per 1M tokens, with a Free tier ($5 of credits, no card), a pay-as-you-go Developer tier, and a subscription-based Enterprise tier. Separately, SambaNova sells RDU-based hardware systems (SambaStack, SambaManaged) that are sales-quoted with no public rate card.

How much does SambaNova Cloud cost per million tokens?

As of July 2026, SambaCloud token rates range from $0.22 input / $0.59 output (gemma-4-31B-it and gpt-oss-120b) to $3.00 input / $4.50 output for DeepSeek-V3.1 and V3.2. Meta-Llama-3.3-70B-Instruct is $0.60 input / $1.20 output, and MiniMax-M2.7 is $0.60 / $2.40 with a $0.06/1M cached-input rate — the first model on the card to carry cached-input pricing.

Does SambaNova have a free tier?

Yes, for the cloud API. The SambaNova Cloud Free plan gives you $5 in API credits with no credit card required, access to Production models, and community support; the credits expire in 30 days. The hardware/systems business has no free tier and is sold through sales.

Is SambaNova usage-based or subscription pricing?

Both, depending on the product. The cloud inference API is pure usage-based (per-token, pay-as-you-go) on the Free and Developer tiers, shifting to subscription-based pricing on Enterprise for larger usage. The RDU hardware systems are sold as sales-quoted enterprise contracts.

SambaNova Pricing

AI Summary

SambaNova sells two things: SambaNova Cloud (SambaCloud), a public per-1M-token inference API, and RDU-based hardware systems (SambaStack/SambaManaged) that are sales-quoted.
SambaCloud token rates (July 2026) span $0.22 input / $0.59 output for gemma-4-31B-it and gpt-oss-120b up to $3.00 input / $4.50 output for DeepSeek-V3.1 and V3.2; Meta-Llama-3.3-70B is $0.60 in / $1.20 out and MiniMax-M2.7 is $0.60 in / $2.40 out with a $0.06/1M cached-input rate.
Three account tiers: Free ($0, $5 of credits that expire in 30 days, no credit card), Developer (pay-as-you-go per token), and Enterprise (subscription/custom with production rate limits, BYOC and custom limits).
Custom RDU silicon (SN40L, and the agentic SN50 launched Feb 2026) is SambaNova's wedge — it sells speed-per-token rather than the cheapest token, competing on fastest inference.
Hardware/DataScale systems carry no public rate card and are sold via enterprise contracts; the company announced the first close of a $1B round at an $11B valuation on July 8, 2026 (General Atlantic-led), on top of a $350M Series E in February 2026 and a 2021 Series D that valued it above 5B.

Pricing summary

SambaNova Cloud 2026 — Inference tiers

SambaCloud is a public per-1M-token inference API (input, output & cached-input rates). Free credits to start, pay-as-you-go after, subscription for scale.

Free

Developers evaluating the API

Developer

$0.22–$3.00 /1M in

Apps in production, pay-as-you-go

Enterprise

Contact sales

High-volume & regulated workloads

SambaNova Cloud rates as of July 2026 (cloud.sambanova.ai/pricing & /plans). RDU hardware systems are sales-quoted separately.

About

SambaNova Systems is a Palo Alto AI company founded in 2017 by Stanford professors Kunle Olukotun and Christopher Ré together with former Oracle executive Rodrigo Liang. Rather than building on NVIDIA GPUs, SambaNova designs its own AI silicon — the Reconfigurable Dataflow Unit (RDU), most recently the SN40L and the agentic-inference SN50 — and packages it into full “chips-to-model” systems. The company raised a $676M Series D led by SoftBank Vision Fund 2 in 2021 at a valuation above 5B, pushing total funding past 1B, and announced a further $350M Series E in February 2026 alongside the SN50 and an Intel collaboration. On July 8, 2026 it announced the first close of a $1B financing round at an $11B valuation, led by General Atlantic with Seligman Ventures and T. Rowe Price Associates — with JPMorganChase named as a new customer selecting SambaNova RDUs for on-prem inference.

For pricing purposes, SambaNova is really two businesses. SambaNova Cloud (branded SambaCloud) is a developer-facing, OpenAI-compatible inference API that rents access to open models — Llama, DeepSeek, Qwen-class, gpt-oss, Gemma, MiniMax — billed per million tokens, with a published rate card and a free tier. SambaStack / SambaManaged is the enterprise hardware side: RDU systems and racks sold as sales-quoted contracts with no public price. The throughline is the chip: SambaNova competes less on the cheapest token and more on the fastest token, routinely claiming record tokens-per-second on its own hardware.

For current rates, see SambaNova Cloud pricing. Note the rate card lives on the cloud.sambanova.ai subdomain — the marketing site’s /pricing path returns a 404 because the systems business is sales-only.

Pricing summary : How SambaNova’s pricing model works

SambaNova’s pricing is hybrid, split cleanly by product:

SambaNova Cloud (inference API) — pure usage-based, billed per 1M tokens with separate input and output rates per model, plus a cached-input rate for models that support prompt caching (MiniMax-M2.7 is the first on the card, at $0.06/1M cached input vs. $0.60 uncached). It has three account tiers: a Free plan ($0, $5 of credits, no credit card, 30-day expiry), a pay-as-you-go Developer plan, and a subscription-based Enterprise plan with production rate limits and add-ons like BYOC and custom limits. This rate card is fully public.
RDU systems (SambaStack / SambaManaged / DataScale) — sales-quoted. There is no public price for the hardware, racks, or managed deployments; these are enterprise contracts sold by SambaNova’s go-to-market team.

So the buyer journey is genuinely self-serve at the bottom (sign up, get $5, call the API) and sales-led at the top (buy or rent RDU capacity), with the per-token API serving as both a product and a demand-generation funnel into the silicon.

What makes this different: Most inference APIs are reselling NVIDIA GPU time and compete on price-per-token. SambaNova runs the same open models on its own RDU silicon and competes on speed-per-token — the rate card is the wrapper, but the pitch is “fastest inference,” not “cheapest.” That makes its per-token prices closer to mid-pack while its differentiation lives in throughput and latency.

Pricing by product

SambaNova Cloud per-1M-token rates, as of July 2026 (USD). The rate card now carries a separate Cached Input Tokens column for models that support prompt caching:

Model	Cached input /1M	Input /1M	Output /1M	Notes
gemma-4-31B-it	N/A	$0.22	$0.59	Cheapest on the card
gpt-oss-120b	N/A	$0.22	$0.59	Open-weight reasoning
MiniMax-M2.7	$0.06	$0.60	$2.40	Only model with a cached-input rate; high output cost
Meta-Llama-3.3-70B-Instruct	N/A	$0.60	$1.20	Mainstream workhorse
DeepSeek-V3.1	N/A	$3.00	$4.50	Frontier-class, 671B params
DeepSeek-V3.2	N/A	$3.00	$4.50	Frontier-class, priciest

Account tiers (SambaNova Cloud):

Tier	Price	Included	Key mechanics
Free	$0	$5 credits, Production models	No card; credits expire in 30 days
Developer	Pay-as-you-go	All Production & Preview models	Standard rate limits, per-token billing
Enterprise	Subscription / custom	Production rate limits, BYOC	Sales-quoted for larger usage

Sales motions across products: the cloud API Free and Developer tiers are fully self-serve (PLG); Enterprise and all RDU hardware (SambaStack, SambaManaged, DataScale) are sales-led and quoted. There is no public price for the systems business.

Hidden costs : What SambaNova users actually pay

On the cloud side the rate card is clean, but real bills depend on a few things beyond the headline per-token number:

Line item	Cost
Input tokens (e.g. Llama-3.3-70B)	$0.60 per 1M
Output tokens (e.g. Llama-3.3-70B)	$1.20 per 1M
Reasoning / “thinking” tokens	Billed as output — DeepSeek-V3.2 reasoning at $4.50/1M output adds up fast
Free credits	$5, then they expire in 30 days
Enterprise rate limits / dedicated capacity	Sales-quoted (subscription)
RDU systems / SambaStack	Sales-quoted; no public price

The real cost traps are structural, not line-item. First, output and reasoning tokens dominate — output rates run 2–5x input (MiniMax-M2.7 is $0.60 in but $2.40 out), so chatty or chain-of-thought workloads cost far more than the input-side rate suggests. Second, the DeepSeek-V3.1/V3.2 frontier tier at $3.00/$4.50 is roughly 14x the cheapest models (gemma-4-31B-it and gpt-oss-120b at $0.22 input), so model choice swings the bill enormously. Third, the $5 free credit expires in 30 days, so the trial doesn’t bridge a slow procurement cycle. And on the systems side, the entire cost is opaque until you talk to sales.

Want to estimate your own SambaNova Cloud bill? Use the SambaNova pricing calculator to model your costs based on model and token volume.

Pricing evolution : SambaNova pricing history and changes

Cadence

Period	Price changes	Product / SKU additions	Notes
2024 H2	Public token rate card launched	SambaNova Cloud (free + pay-as-you-go)	OpenAI-compatible API on RDU
2025 H2	Per-model rates tracked	Sovereign-AI regional clouds	Argyll, Infercom, OVHcloud, SouthernCrossAI
2026 Q1–Q2	Rate card spans $0.15–$3.00 input	SN50 RDU; $350M Series E	Newer DeepSeek/gpt-oss/Gemma/MiniMax models added
2026 Q3	gemma-4-31B-it cut to $0.22/$0.59; cached-input rate added	$1B financing at $11B valuation	Card trimmed to 6 models; MiniMax-M2.7 gets $0.06 cached input

Tracked range: 2024–present. The systems/hardware business has never published a public price, so only the cloud rate card is trackable.

Notable changes

2024 H2 — SambaNova Cloud launches as a public, OpenAI-compatible inference API with a free developer tier and per-token pay-as-you-go billing, positioned on fastest-token throughput for open models rather than per-GPU-hour rental.
Late 2025 — Sovereign-AI inference partnerships (UK, Germany, EU, Australia) extend the token-based cloud regionally while keeping the published rate card.
June 2026 — Rate card spans $0.15/$0.75 (DeepSeek-V3.1-cb) to $3.00/$4.50 (DeepSeek-V3.1/V3.2), with Meta-Llama-3.3-70B at $0.60/$1.20 and gpt-oss-120b at $0.22/$0.59. SN50 RDU and a $350M Series E announced in February 2026.
July 2026 — Card trimmed to six models and gemma-4-31B-it cut from $0.38/$1.15 to $0.22/$0.59; DeepSeek-V3.1-cb ($0.15/$0.75) and DeepSeek-R1-Distill-Llama-70B ($0.70/$1.40) dropped from the public card. A new Cached Input Tokens column appears, with MiniMax-M2.7 priced at $0.06/1M cached input. On July 8, 2026 SambaNova announced the first close of a $1B round at an $11B valuation (General Atlantic-led), with JPMorganChase named as a new RDU customer.

The direction of travel is model proliferation and rate-card cleanup, not flat price moves: SambaNova keeps swapping newer open models onto tiered rates, trimming older ones, and is now layering in prompt-caching (cached-input) pricing, so the effective cost depends almost entirely on which model you pick and whether it supports caching.

What’s unique : SambaNova’s distinctive pricing mechanics

1. Speed as the value metric, not price. SambaNova prices per token like everyone else, but the product it’s actually selling is throughput on custom RDU silicon. Its marketing leads with record tokens-per-second, so buyers pay mid-pack token rates for top-tier latency rather than the cheapest possible token.

2. A true free tier on inference, sales-only on hardware. The same company offers a no-credit-card $5 free tier on the cloud API and a fully gated, contact-sales motion on its RDU systems — a clean split between PLG funnel and enterprise sale within one brand.

3. Per-model price spread, not per-tier. Instead of bundling tokens into plan tiers, SambaNova lets the model choice set the price: from $0.22 input for gemma-4-31B-it and gpt-oss-120b to $3.00 input for the frontier DeepSeek-V3.1/V3.2 — roughly a 14x spread on the same rate card. The July 2026 refresh narrowed the range by dropping the sub-$0.22 DeepSeek-V3.1-cb, so the cheapest model is now the price floor rather than a distilled outlier.

4. Cached-input pricing enters as a third price axis. As of July 2026 the rate card carries a separate Cached Input Tokens column, with MiniMax-M2.7 billed at $0.06/1M cached versus $0.60/1M uncached — a ~10x discount that rewards repeated-prompt and long-context workloads. It moves SambaNova from a two-number (input/output) meter toward a three-number one, and signals more models will likely gain cached rates.

Strengths & weaknesses

Strengths	Weaknesses
Public, transparent per-token rate card	Marketing-site `/pricing` 404s; rate card hidden on cloud subdomain
Genuine free tier ($5, no card) on the API	Free credits expire in 30 days
Differentiated on inference speed (custom RDU)	Token rates are mid-pack, not cheapest
OpenAI-compatible API, easy migration	Hardware/systems pricing fully opaque (sales-only)
Newer open models added quickly	Output/reasoning tokens make bills hard to predict

Billing UX : SambaNova billing controls and transparency

Billing controls — Self-serve console issues an API key; the Free tier draws down $5 of credits, after which you add a card and pay-as-you-go on the Developer tier. Enterprise moves to subscription-based pricing with production rate limits.
Usage visibility — Per-token billing with separate input, output and (where supported) cached-input rates is shown on the public pricing page; consumption is metered against credits, then the card. The console carries dedicated Pricing, Billing, Usage, and Commits-and-Credits views.
Payment options — Self-serve credit-card checkout for Free/Developer; sales-led contracts, invoicing, and BYOC/custom-rate-limit arrangements for Enterprise and all RDU hardware.

Strategic wins : Why SambaNova’s pricing decisions worked

1. Using a free token tier as a funnel into custom silicon

The $5-no-card cloud tier lets any developer try RDU-backed inference in minutes, turning a hardware company’s API into a top-of-funnel acquisition channel. See how AI companies structure pricing.

2. Competing on speed instead of racing token prices to zero

By anchoring on fastest-inference rather than cheapest-token, SambaNova avoids the deflationary token price war and justifies mid-pack rates with throughput — a value-metric choice. Related: outcome-based pricing trends.

3. Letting model choice carry the price spread

Rather than rigid plan tiers, SambaNova prices each model independently across a ~14x range (July 2026, after the sub-$0.22 DeepSeek-V3.1-cb was retired), so customers self-select cost/quality without a packaging negotiation. See choosing the right usage metric.

Areas to improve : Gaps in SambaNova’s pricing approach

1. Discoverability of the rate card

The marketing site’s /pricing path 404s and the real rate card lives on a separate cloud subdomain, so prospective buyers hit a dead end on the obvious URL. See bill shock and cost unpredictability.

2. Output-token predictability

With output rates 2–5x input and reasoning tokens billed as output, bills are hard to forecast. A token-estimator or per-request cost preview would reduce surprise charges for chain-of-thought workloads.

3. Opaque systems pricing

The entire RDU hardware business is sales-quoted with no indicative public number, which slows evaluation for buyers comparing against GPU-cloud alternatives that publish at least banded rates.

Monetization stack & signals : how SambaNova builds & buys its revenue engine

Buys 2 Builds 1

The read — where the monetization investment is going

Buys the self-serve money-movement layer (Stripe portal for cards/invoices, AWS Marketplace as the enterprise procurement rail) while its own per-token usage meter sits in front, feeding both. No sourced sign of a CRM/CPQ/rev-rec stack behind the sales-quoted RDU hardware — that quote-to-cash spine stays unconfirmed.

Stack — build vs buy

Builds in-house · 1

Metering Metering inferred Docs Jun 2026

“Per-1M-token input/output rates metered against $5 of credits, then drawn down per token on the Developer tier — a usage meter feeding the Stripe payment layer.”

Buys (vendor) · 2

Stripe Payments Docs Jun 2026

“The 'Manage Billing' link opens the Stripe customer portal; the Stripe billing portal shows 'No invoice history' — meaning none of these invoices are ever pushed to Stripe for payment.”
AWS Marketplace Billing Docs Jun 2026

“SambaNova is available through the AWS Marketplace, enabling enterprises to streamline procurement and consolidate billing through their existing AWS account.”

Signals reviewed Jun 2026 · derived from product docs

Key takeaways

SambaNova is a hybrid model — public per-token usage pricing on the cloud API, sales-quoted contracts on RDU hardware. For the underlying model, see the introduction to usage-based pricing.
Token rates span ~14x by model — from $0.22 input (gemma-4-31B-it, gpt-oss-120b) to $3.00 input (DeepSeek-V3.1/V3.2) — so model selection, not tier, drives the bill; MiniMax-M2.7 also now carries a $0.06/1M cached-input rate for repeated prompts.
There’s a real free tier on inference — $5 of credits, no credit card — but it expires in 30 days.
The differentiation is speed, not price — SambaNova runs open models on its own RDU silicon and sells fastest-inference at mid-pack token rates.
The hardware business stays opaque — no public price for SambaStack/SambaManaged/DataScale; everything above the API is a sales conversation.

UBP implications

A usage-based API can be a funnel for a non-usage product. SambaNova uses a metered, free-tier inference API to generate demand for sales-quoted silicon — usage pricing as acquisition, not just monetization.
The value metric need not be the cheapest unit. Pricing per token while competing on tokens-per-second shows a usage-based vendor can hold mid-pack unit prices if it differentiates on a quality dimension buyers can feel.
Per-item pricing can replace tiered packaging. Letting each model set its own rate across a wide spread lets customers self-select cost vs. quality without bundles — a clean pattern for catalogs of fungible units.

Sources

SambaNova Cloud pricing (per-token rate card) (accessed 2026-07-23)
SambaNova Cloud plans (Free / Developer / Enterprise) (accessed 2026-07-23)
SambaNova docs — SambaCloud supported models (independent confirmation of the current model lineup) (accessed 2026-07-23)
SambaNova systems marketing site (systems sales-quoted; $11B raise announced July 8, 2026) (accessed 2026-07-23)
SambaNova blog — SN50 RDU & Series E (accessed 2026-06-15)
Built In SF — SambaNova raises $676M at $5B valuation (accessed 2026-06-15)

Bottom line

SambaNova is a hybrid pricing story: a transparent, usage-based inference API (SambaNova Cloud) with a free tier and per-1M-token rates from $0.22 to $3.00 input (plus a new $0.06/1M cached-input rate on MiniMax-M2.7), bolted onto a sales-only RDU hardware business with no public price at all. The cloud rate card competes on speed rather than the cheapest token — SambaNova runs open models on its own silicon and sells fastest-inference — while the free $5-no-card tier funnels developers toward both pay-as-you-go usage and, eventually, enterprise systems deals. The things to watch are output-token costs and the opaque hardware pricing above the API. Browse the pricing blueprint for more fully-researched company profiles, or compare SambaNova against other Infrastructure, Compute & MLOps companies.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Cached-input pricing added; gemma cut to $0.22; $11B raise

Jul 2026

July 2026 rate card trimmed to six models: gemma-4-31B-it cut from $0.38/$1.15 to $0.22/$0.59; DeepSeek-V3.1-cb and DeepSeek-R1-Distill dropped; a new Cached Input Tokens column debuts with MiniMax-M2.7 at $0.06/1M cached. DeepSeek-V3.1/V3.2 stay $3.00/$4.50, Llama-3.3-70B $0.60/$1.20. SambaNova closed a $1B round at an $11B valuation (General Atlantic) on July 8, 2026.

captured 2026-07-23

Rate card spans $0.15 to $3.00 input across newer models

Jun 2026

June 2026 SambaCloud rate card: DeepSeek-V3.1-cb $0.15/$0.75, gpt-oss-120b $0.22/$0.59, gemma-4-31B-it $0.38/$1.15, Meta-Llama-3.3-70B $0.60/$1.20, MiniMax-M2.7 $0.60/$2.40, DeepSeek-R1-Distill-Llama-70B $0.70/$1.40, DeepSeek-V3.1/V3.2 $3.00/$4.50. SN50 RDU and $350M Series E announced Feb 2026.

Sovereign-AI inference partnerships expand the footprint

Dec 2025

SambaNova signs sovereign-AI inference deals (Argyll UK, Infercom Germany, OVHcloud EU, SouthernCrossAI Australia), extending the token-based cloud into region-specific clouds while keeping the published rate card.

SambaNova Cloud launches with a free developer tier

Sep 2024

SambaNova opens a public, OpenAI-compatible inference API positioned on fastest-token-throughput for open models (Llama family), with a free tier and pay-as-you-go per-token billing rather than per-GPU-hour.

Trivia

· SambaNova was founded in 2017 by Stanford professors Kunle Olukotun and Christopher Ré with ex-Oracle exec Rodrigo Liang; its 2021 Series D ($676M, SoftBank-led) valued it above 5B.
· Its pricing pitch isn't the cheapest token — it's the fastest. SambaNova runs open models on its own RDU silicon and routinely claims record tokens-per-second for Llama, DeepSeek, gpt-oss and Gemma.
· The public rate card lives on cloud.sambanova.ai, not sambanova.ai/pricing — the marketing domain's /pricing path 404s, because the hardware business has no public price at all.

Questions & answers

How does SambaNova's pricing work?: SambaNova is hybrid. SambaNova Cloud (SambaCloud) is a public, usage-based inference API billed per 1M tokens, with a Free tier ($5 of credits, no card), a pay-as-you-go Developer tier, and a subscription-based Enterprise tier. Separately, SambaNova sells RDU-based hardware systems (SambaStack, SambaManaged) that are sales-quoted with no public rate card.
How much does SambaNova Cloud cost per million tokens?: As of July 2026, SambaCloud token rates range from $0.22 input / $0.59 output (gemma-4-31B-it and gpt-oss-120b) to $3.00 input / $4.50 output for DeepSeek-V3.1 and V3.2. Meta-Llama-3.3-70B-Instruct is $0.60 input / $1.20 output, and MiniMax-M2.7 is $0.60 / $2.40 with a $0.06/1M cached-input rate — the first model on the card to carry cached-input pricing.
Does SambaNova have a free tier?: Yes, for the cloud API. The SambaNova Cloud Free plan gives you $5 in API credits with no credit card required, access to Production models, and community support; the credits expire in 30 days. The hardware/systems business has no free tier and is sold through sales.
Is SambaNova usage-based or subscription pricing?: Both, depending on the product. The cloud inference API is pure usage-based (per-token, pay-as-you-go) on the Free and Developer tiers, shifting to subscription-based pricing on Enterprise for larger usage. The RDU hardware systems are sold as sales-quoted enterprise contracts.