AI Summary
About
SambaNova Systems is a Palo Alto AI company founded in 2017 by Stanford professors Kunle Olukotun and Christopher Ré together with former Oracle executive Rodrigo Liang. Rather than building on NVIDIA GPUs, SambaNova designs its own AI silicon — the Reconfigurable Dataflow Unit (RDU), most recently the SN40L and the agentic-inference SN50 — and packages it into full “chips-to-model” systems. The company raised a $676M Series D led by SoftBank Vision Fund 2 in 2021 at a valuation above 5B, pushing total funding past 1B, and announced a further $350M Series E in February 2026 alongside the SN50 and an Intel collaboration.
For pricing purposes, SambaNova is really two businesses. SambaNova Cloud (branded SambaCloud) is a developer-facing, OpenAI-compatible inference API that rents access to open models — Llama, DeepSeek, Qwen-class, gpt-oss, Gemma, MiniMax — billed per million tokens, with a published rate card and a free tier. SambaStack / SambaManaged is the enterprise hardware side: RDU systems and racks sold as sales-quoted contracts with no public price. The throughline is the chip: SambaNova competes less on the cheapest token and more on the fastest token, routinely claiming record tokens-per-second on its own hardware.
For current rates, see SambaNova Cloud pricing. Note the rate card lives on the cloud.sambanova.ai subdomain — the marketing site’s /pricing path returns a 404 because the systems business is sales-only.
Pricing summary : How SambaNova’s pricing model works
SambaNova’s pricing is hybrid, split cleanly by product:
- SambaNova Cloud (inference API) — pure usage-based, billed per 1M tokens with separate input and output rates per model. It has three account tiers: a Free plan ($0, $5 of credits, no credit card, 30-day expiry), a pay-as-you-go Developer plan, and a subscription-based Enterprise plan with production rate limits and add-ons like BYOC and custom limits. This rate card is fully public.
- RDU systems (SambaStack / SambaManaged / DataScale) — sales-quoted. There is no public price for the hardware, racks, or managed deployments; these are enterprise contracts sold by SambaNova’s go-to-market team.
So the buyer journey is genuinely self-serve at the bottom (sign up, get $5, call the API) and sales-led at the top (buy or rent RDU capacity), with the per-token API serving as both a product and a demand-generation funnel into the silicon.
What makes this different: Most inference APIs are reselling NVIDIA GPU time and compete on price-per-token. SambaNova runs the same open models on its own RDU silicon and competes on speed-per-token — the rate card is the wrapper, but the pitch is “fastest inference,” not “cheapest.” That makes its per-token prices closer to mid-pack while its differentiation lives in throughput and latency.
Pricing by product
SambaNova Cloud per-1M-token rates, as of June 2026 (input / output, USD):
| Model | Input /1M | Output /1M | Notes |
|---|---|---|---|
| DeepSeek-V3.1-cb | $0.15 | $0.75 | Cheapest on the card |
| gpt-oss-120b | $0.22 | $0.59 | Open-weight reasoning |
| gemma-4-31B-it | $0.38 | $1.15 | Mid-size instruct |
| Meta-Llama-3.3-70B-Instruct | $0.60 | $1.20 | Mainstream workhorse |
| MiniMax-M2.7 | $0.60 | $2.40 | High output cost |
| DeepSeek-R1-Distill-Llama-70B | $0.70 | $1.40 | Distilled reasoning |
| DeepSeek-V3.1 / V3.2 | $3.00 | $4.50 | Frontier-class, priciest |
Account tiers (SambaNova Cloud):
| Tier | Price | Included | Key mechanics |
|---|---|---|---|
| Free | $0 | $5 credits, Production models | No card; credits expire in 30 days |
| Developer | Pay-as-you-go | All Production & Preview models | Standard rate limits, per-token billing |
| Enterprise | Subscription / custom | Production rate limits, BYOC | Sales-quoted for larger usage |
Sales motions across products: the cloud API Free and Developer tiers are fully self-serve (PLG); Enterprise and all RDU hardware (SambaStack, SambaManaged, DataScale) are sales-led and quoted. There is no public price for the systems business.
Hidden costs : What SambaNova users actually pay
On the cloud side the rate card is clean, but real bills depend on a few things beyond the headline per-token number:
| Line item | Cost |
|---|---|
| Input tokens (e.g. Llama-3.3-70B) | $0.60 per 1M |
| Output tokens (e.g. Llama-3.3-70B) | $1.20 per 1M |
| Reasoning / “thinking” tokens | Billed as output — DeepSeek-R1-distill at $1.40/1M output adds up fast |
| Free credits | $5, then they expire in 30 days |
| Enterprise rate limits / dedicated capacity | Sales-quoted (subscription) |
| RDU systems / SambaStack | Sales-quoted; no public price |
The real cost traps are structural, not line-item. First, output and reasoning tokens dominate — output rates run 2–5x input (MiniMax-M2.7 is $0.60 in but $2.40 out), so chatty or chain-of-thought workloads cost far more than the input-side rate suggests. Second, the DeepSeek-V3.1/V3.2 frontier tier at $3.00/$4.50 is roughly 20x the cheapest model, so model choice swings the bill enormously. Third, the $5 free credit expires in 30 days, so the trial doesn’t bridge a slow procurement cycle. And on the systems side, the entire cost is opaque until you talk to sales.
Want to estimate your own SambaNova Cloud bill? Use the SambaNova pricing calculator to model your costs based on model and token volume.
Pricing evolution : SambaNova pricing history and changes
Cadence
| Period | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2024 H2 | Public token rate card launched | SambaNova Cloud (free + pay-as-you-go) | OpenAI-compatible API on RDU |
| 2025 H2 | Per-model rates tracked | Sovereign-AI regional clouds | Argyll, Infercom, OVHcloud, SouthernCrossAI |
| 2026 Q1–Q2 | Rate card spans $0.15–$3.00 input | SN50 RDU; $350M Series E | Newer DeepSeek/gpt-oss/Gemma/MiniMax models added |
Tracked range: 2024–present. The systems/hardware business has never published a public price, so only the cloud rate card is trackable.
Notable changes
- 2024 H2 — SambaNova Cloud launches as a public, OpenAI-compatible inference API with a free developer tier and per-token pay-as-you-go billing, positioned on fastest-token throughput for open models rather than per-GPU-hour rental.
- Late 2025 — Sovereign-AI inference partnerships (UK, Germany, EU, Australia) extend the token-based cloud regionally while keeping the published rate card.
- June 2026 — Rate card spans $0.15/$0.75 (DeepSeek-V3.1-cb) to $3.00/$4.50 (DeepSeek-V3.1/V3.2), with Meta-Llama-3.3-70B at $0.60/$1.20 and gpt-oss-120b at $0.22/$0.59. SN50 RDU and a $350M Series E announced in February 2026.
The direction of travel is model proliferation, not headline price moves: SambaNova keeps adding newer open models at tiered rates rather than re-cutting a flat per-token price, so the effective cost depends almost entirely on which model you pick.
What’s unique : SambaNova’s distinctive pricing mechanics
1. Speed as the value metric, not price. SambaNova prices per token like everyone else, but the product it’s actually selling is throughput on custom RDU silicon. Its marketing leads with record tokens-per-second, so buyers pay mid-pack token rates for top-tier latency rather than the cheapest possible token.
2. A true free tier on inference, sales-only on hardware. The same company offers a no-credit-card $5 free tier on the cloud API and a fully gated, contact-sales motion on its RDU systems — a clean split between PLG funnel and enterprise sale within one brand.
3. Per-model price spread, not per-tier. Instead of bundling tokens into plan tiers, SambaNova lets the model choice set the price: from $0.15 input for a distilled DeepSeek to $3.00 input for the frontier model — roughly a 20x spread on the same rate card.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| Public, transparent per-token rate card | Marketing-site /pricing 404s; rate card hidden on cloud subdomain |
| Genuine free tier ($5, no card) on the API | Free credits expire in 30 days |
| Differentiated on inference speed (custom RDU) | Token rates are mid-pack, not cheapest |
| OpenAI-compatible API, easy migration | Hardware/systems pricing fully opaque (sales-only) |
| Newer open models added quickly | Output/reasoning tokens make bills hard to predict |
Billing UX : SambaNova billing controls and transparency
- Billing controls — Self-serve console issues an API key; the Free tier draws down $5 of credits, after which you add a card and pay-as-you-go on the Developer tier. Enterprise moves to subscription-based pricing with production rate limits.
- Usage visibility — Per-token billing with separate input/output rates is shown on the public pricing page; consumption is metered against credits, then the card.
- Payment options — Self-serve credit-card checkout for Free/Developer; sales-led contracts, invoicing, and BYOC/custom-rate-limit arrangements for Enterprise and all RDU hardware.
Strategic wins : Why SambaNova’s pricing decisions worked
1. Using a free token tier as a funnel into custom silicon
The $5-no-card cloud tier lets any developer try RDU-backed inference in minutes, turning a hardware company’s API into a top-of-funnel acquisition channel. See how AI companies structure pricing.
2. Competing on speed instead of racing token prices to zero
By anchoring on fastest-inference rather than cheapest-token, SambaNova avoids the deflationary token price war and justifies mid-pack rates with throughput — a value-metric choice. Related: outcome-based pricing trends.
3. Letting model choice carry the price spread
Rather than rigid plan tiers, SambaNova prices each model independently across a 20x range, so customers self-select cost/quality without a packaging negotiation. See choosing the right usage metric.
Areas to improve : Gaps in SambaNova’s pricing approach
1. Discoverability of the rate card
The marketing site’s /pricing path 404s and the real rate card lives on a separate cloud subdomain, so prospective buyers hit a dead end on the obvious URL. See bill shock and cost unpredictability.
2. Output-token predictability
With output rates 2–5x input and reasoning tokens billed as output, bills are hard to forecast. A token-estimator or per-request cost preview would reduce surprise charges for chain-of-thought workloads.
3. Opaque systems pricing
The entire RDU hardware business is sales-quoted with no indicative public number, which slows evaluation for buyers comparing against GPU-cloud alternatives that publish at least banded rates.
Key takeaways
- SambaNova is a hybrid model — public per-token usage pricing on the cloud API, sales-quoted contracts on RDU hardware. For the underlying model, see the introduction to usage-based pricing.
- Token rates span ~20x by model — from $0.15 input (DeepSeek-V3.1-cb) to $3.00 input (DeepSeek-V3.1/V3.2) — so model selection, not tier, drives the bill.
- There’s a real free tier on inference — $5 of credits, no credit card — but it expires in 30 days.
- The differentiation is speed, not price — SambaNova runs open models on its own RDU silicon and sells fastest-inference at mid-pack token rates.
- The hardware business stays opaque — no public price for SambaStack/SambaManaged/DataScale; everything above the API is a sales conversation.
UBP implications
- A usage-based API can be a funnel for a non-usage product. SambaNova uses a metered, free-tier inference API to generate demand for sales-quoted silicon — usage pricing as acquisition, not just monetization.
- The value metric need not be the cheapest unit. Pricing per token while competing on tokens-per-second shows a usage-based vendor can hold mid-pack unit prices if it differentiates on a quality dimension buyers can feel.
- Per-item pricing can replace tiered packaging. Letting each model set its own rate across a wide spread lets customers self-select cost vs. quality without bundles — a clean pattern for catalogs of fungible units.
Sources
- SambaNova Cloud pricing (per-token rate card) (accessed 2026-06-15)
- SambaNova Cloud plans (Free / Developer / Enterprise) (accessed 2026-06-15)
- SambaNova systems marketing site (systems sales-quoted) (accessed 2026-06-15)
- SambaNova blog — SN50 RDU & Series E (accessed 2026-06-15)
- Built In SF — SambaNova raises $676M at $5B valuation (accessed 2026-06-15)
Bottom line
SambaNova is a hybrid pricing story: a transparent, usage-based inference API (SambaNova Cloud) with a free tier and per-1M-token rates from $0.15 to $3.00 input, bolted onto a sales-only RDU hardware business with no public price at all. The cloud rate card competes on speed rather than the cheapest token — SambaNova runs open models on its own silicon and sells fastest-inference — while the free $5-no-card tier funnels developers toward both pay-as-you-go usage and, eventually, enterprise systems deals. The things to watch are output-token costs and the opaque hardware pricing above the API. Browse the pricing blueprint for more fully-researched company profiles, or compare SambaNova against other Infrastructure, Compute & MLOps companies.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
Rate card spans $0.15 to $3.00 input across newer models
June 2026 SambaCloud rate card: DeepSeek-V3.1-cb $0.15/$0.75, gpt-oss-120b $0.22/$0.59, gemma-4-31B-it $0.38/$1.15, Meta-Llama-3.3-70B $0.60/$1.20, MiniMax-M2.7 $0.60/$2.40, DeepSeek-R1-Distill-Llama-70B $0.70/$1.40, DeepSeek-V3.1/V3.2 $3.00/$4.50. SN50 RDU and $350M Series E announced Feb 2026.
Sovereign-AI inference partnerships expand the footprint
SambaNova signs sovereign-AI inference deals (Argyll UK, Infercom Germany, OVHcloud EU, SouthernCrossAI Australia), extending the token-based cloud into region-specific clouds while keeping the published rate card.
SambaNova Cloud launches with a free developer tier
SambaNova opens a public, OpenAI-compatible inference API positioned on fastest-token-throughput for open models (Llama family), with a free tier and pay-as-you-go per-token billing rather than per-GPU-hour.
- · SambaNova was founded in 2017 by Stanford professors Kunle Olukotun and Christopher Ré with ex-Oracle exec Rodrigo Liang; its 2021 Series D ($676M, SoftBank-led) valued it above 5B.
- · Its pricing pitch isn't the cheapest token — it's the fastest. SambaNova runs open models on its own RDU silicon and routinely claims record tokens-per-second for Llama, DeepSeek, gpt-oss and Gemma.
- · The public rate card lives on cloud.sambanova.ai, not sambanova.ai/pricing — the marketing domain's /pricing path 404s, because the hardware business has no public price at all.
Questions & answers
- How does SambaNova's pricing work?
- SambaNova is hybrid. SambaNova Cloud (SambaCloud) is a public, usage-based inference API billed per 1M tokens, with a Free tier ($5 of credits, no card), a pay-as-you-go Developer tier, and a subscription-based Enterprise tier. Separately, SambaNova sells RDU-based hardware systems (SambaStack, SambaManaged) that are sales-quoted with no public rate card.
- How much does SambaNova Cloud cost per million tokens?
- As of June 2026, SambaCloud token rates range from $0.15 input / $0.75 output for DeepSeek-V3.1-cb to $3.00 input / $4.50 output for full DeepSeek-V3.1 and V3.2. Meta-Llama-3.3-70B-Instruct is $0.60 input / $1.20 output, gpt-oss-120b is $0.22 / $0.59, gemma-4-31B-it is $0.38 / $1.15, and MiniMax-M2.7 is $0.60 / $2.40.
- Does SambaNova have a free tier?
- Yes, for the cloud API. The SambaNova Cloud Free plan gives you $5 in API credits with no credit card required, access to Production models, and community support; the credits expire in 30 days. The hardware/systems business has no free tier and is sold through sales.
- Is SambaNova usage-based or subscription pricing?
- Both, depending on the product. The cloud inference API is pure usage-based (per-token, pay-as-you-go) on the Free and Developer tiers, shifting to subscription-based pricing on Enterprise for larger usage. The RDU hardware systems are sold as sales-quoted enterprise contracts.