How does Lambda's GPU cloud pricing work?

Lambda rents NVIDIA GPUs by the GPU-hour, billed per minute. On-demand instances are self-serve and first-come; 1-Click Clusters and Superclusters run on 2-week-to-1-year commitments at lower per-GPU rates; multi-year reserved capacity is sales-quoted. There are no egress fees, and persistent storage is billed separately.

How much does an H100 cost on Lambda?

As of June 2026, an on-demand NVIDIA H100 SXM (80GB) lists at $3.99/GPU/hr and an H100 PCIe at $3.29/GPU/hr. In a committed 1-Click Cluster, H100 systems run roughly $5.54-$6.16/GPU/hr depending on cluster size. H100 SXM on-demand was $2.99/GPU/hr through 2025 before rising in 2026.

Does Lambda have a free tier?

No. Lambda has no free GPU tier. You pay per-minute for any instance you launch, and persistent storage continues billing (about $0.20/GiB/month) even when an instance is stopped, so orphaned volumes are a real hidden cost.

Is Lambda cheaper than AWS, CoreWeave, or RunPod?

Lambda has historically been one of the cheapest H100/A100 clouds, undercutting hyperscalers (which charge roughly $2-11/hr for an H100) and often beating CoreWeave. RunPod and peer-to-peer marketplaces like Vast.ai can be cheaper per hour, but Lambda is favored for its research-friendly, pre-configured stack. The main trade-off is capacity: popular GPUs sell out during peak demand.

What happened to the Lambda Inference API?

Lambda is winding down its hosted Inference API, which previously charged per million tokens (roughly $0.02 for small 3B models up to about $0.90 for 405B models). Lambda now steers inference workloads onto its raw GPU instances and clusters instead of the managed token endpoint.

Lambda (Lambda Labs) Pricing

AI Summary

Lambda (formerly Lambda Labs) is a GPU cloud that rents NVIDIA GPUs by the GPU-hour, billed per minute with no egress fees.
On-demand list prices (June 2026): B200 SXM $6.69, H100 SXM $3.99, H100 PCIe $3.29, GH200 $2.29, A100 SXM 80GB $2.79, A100 40GB $1.99/GPU/hr.
1-Click Clusters of 16-2,000+ B200 or H100 GPUs run on 2-week-to-1-year commitments (B200 from $8.87-$9.86/GPU/hr; H100 $5.54-$6.16); longer reserved terms are sales-quoted.
Lambda historically undercut hyperscalers on H100/A100, but raised on-demand rates through 2025-2026 (H100 SXM $2.99 to $3.99) as Microsoft/superintelligence demand outstripped capacity.
Lambda is winding down its per-token Inference API to focus on raw GPU instances and clusters.

Pricing summary

Lambda 2026 — On-demand GPU pricing

Per-GPU-hour list rates for self-serve instances, billed per minute with no egress fees.

A100 / V100

$0.79–$2.79 /GPU/hr

Cost-sensitive training & inference

H100

$3.99 /GPU/hr

Mainstream LLM training & fine-tuning

B200 / Clusters

$6.69+ /GPU/hr

Frontier-scale training & reserved capacity

List prices as of June 2026 (lambda.ai/pricing); excludes applicable sales tax. Verify current rates before committing.

About

Lambda — long known as Lambda Labs — is a GPU cloud and AI compute infrastructure provider founded in 2012 by brothers Stephen and Michael Balaban. The company started life with an unlikely product: a facial-recognition API for Google Glass. By 2017 it had pivoted into hardware, selling GPU laptops and deep-learning workstations to AI researchers, and from there into renting that same NVIDIA compute over the cloud. Today Lambda rents NVIDIA GPUs by the GPU-hour to roughly 200,000+ AI developers and teams, with its instances pre-loaded with “Lambda Stack” (CUDA, PyTorch, TensorFlow) so customers can train in minutes.

Lambda has become one of the higher-profile “neoclouds” — specialized GPU-cloud providers competing with hyperscalers and rivals like CoreWeave and RunPod. Its growth has been steep: third-party research puts revenue at roughly $250M in 2023, around $425M by end of 2024, and on the order of $760M in 2025. In November 2025 Lambda announced a multibillion-dollar deal to supply Microsoft with AI infrastructure built on tens of thousands of NVIDIA GPUs, then raised over $1.5B in a round led by TWG Global at a reported valuation in the $4-5B range. Lambda has also been reported in talks for pre-IPO financing with an IPO targeted for the second half of 2026.

For current pricing, see Lambda’s pricing page. Note the company rebranded its primary domain from lambdalabs.com to lambda.ai.

Pricing summary : How Lambda’s pricing model works

Lambda is pure usage-based: you pay per GPU-hour for the instances you launch, billed in per-minute increments, with no egress fees. There is no free tier and no monthly subscription — the meter runs whenever a GPU is allocated to you. Pricing splits across three surfaces:

On-Demand Instances — self-serve, first-come 1/2/4/8-GPU configs of B200, H100, A100, GH200, and older cards. List rates are published openly per GPU-hour.
1-Click Clusters & Superclusters — production clusters of 16 to 2,000+ interconnected B200 or H100 GPUs on 2-week-to-1-year commitments, at lower per-GPU rates than on-demand.
Reserved / Private capacity — multi-year reserved capacity at Lambda’s lowest prices, quoted by sales (“talk to our team”).

What makes this different: Lambda publishes hard per-GPU-hour numbers for both on-demand and committed clusters — unusual transparency for a category where rivals often gate cluster pricing behind sales calls. The catch is the inverse of most AI infra: instead of prices deflating, Lambda has been raising on-demand rates as frontier-model demand outpaces GPU supply.

Pricing by product

On-demand list prices, per GPU-hour, as of June 2026 (excludes sales tax):

GPU (on-demand)	VRAM	Price/GPU/hr	Best for
NVIDIA B200 SXM6	180 GB	$6.69	Frontier training
NVIDIA H100 SXM	80 GB	$3.99	Mainstream LLM training
NVIDIA A100 SXM	80 GB	$2.79	Cost-efficient training
NVIDIA A100 SXM	40 GB	$1.99	Budget training
NVIDIA Tesla V100	16 GB	$0.79	Light / legacy workloads

1-Click Clusters (committed, 2 weeks – 1 year), per GPU-hour:

Cluster	16 GPUs	64 GPUs	256+ GPUs
NVIDIA HGX B200	$9.86	$9.36	$8.87
NVIDIA H100	sales-quoted	sales-quoted	sales-quoted

Sales motions across products: on-demand instances are fully self-serve (PLG); 1-Click Clusters are self-serve to launch but steer larger and multi-year commitments to “talk to our team” (sales-led reserved). Lambda is winding down its managed per-token Inference API in favor of raw GPU instances and clusters.

Hidden costs : What Lambda users actually pay

Lambda’s headline rates are clean (no egress, per-minute billing), but real bills include a few items beyond the GPU-hour:

Line item	Cost
GPU-hour (e.g. 8x H100 SXM)	$3.99/GPU/hr → ~$31.92/hr for the node
Persistent storage	~$0.20/GiB/month, billed hourly — continues even when unmounted
Egress / data transfer	$0 (no egress fees)
Idle/stopped storage	Storage keeps billing while instances are stopped
Sales tax	Added on top where applicable

The single biggest real-world cost driver isn’t a fee — it’s capacity. Reviewers consistently report popular GPUs (especially H100 configs) selling out during peak demand; one user described a 26-hour “temporarily unavailable” wall when scaling from 2 to 4 GPUs. The second is orphaned storage: persistent volumes keep billing at roughly $0.20/GiB/month even after you stop the instance, so cleanup matters.

The retired Inference API previously billed per million tokens (roughly $0.02 for a small 3B model up to about $0.90 for a 405B model, unified input/output) — relevant only as a historical reference now.

Want to estimate your own Lambda bill? Use the Lambda pricing calculator to model your costs based on GPU type and hours.

Pricing evolution : Lambda pricing history and changes

Cadence

Period	Price changes	Product / SKU additions	Notes
2025 H1	—	B200 launched on Lambda Cloud	H100 SXM $2.99, A100 80GB $1.79, V100 $0.55
2025 H2 → 2026 Q1	B200 listed $4.99	GH200 instances added	H100 SXM held at $2.99
2026 Q2	Across-the-board increase	—	H100 SXM → $3.99, B200 → $6.69, A100 80GB → $2.79

Tracked range: 2025–present (Wayback snapshots from June 2025 onward; see tools/wayback-index/lambda-labs.json).

Notable changes

Mid-2025 — On-demand 8x H100 SXM at $2.99/GPU/hr, A100 SXM 80GB at $1.79, A100 40GB $1.29, V100 $0.55. NVIDIA HGX B200 launching, advertised as low as $2.99/GPU/hr with multi-year commitment.
Early 2026 — B200 SXM6 on-demand listed at $4.99/GPU/hr; H100 SXM unchanged at $2.99.
June 2026 — Broad increase: B200 SXM $6.69 (up from $4.99), H100 SXM $3.99 (up from $2.99), A100 SXM 80GB $2.79 (up from $1.79), V100 $0.79. 1-Click Cluster pricing published openly: B200 $8.87–$9.86, H100 $5.54–$6.16/GPU/hr.

The direction of travel is the headline: Lambda raised on-demand rates roughly 30-50% across cards over a year, bucking the usual GPU-cost-deflation story as Microsoft-scale demand and a Series-E war chest tightened supply.

What’s unique : Lambda’s distinctive pricing mechanics

1. Published cluster pricing. Lambda lists hard per-GPU-hour numbers for 1-Click Clusters of up to 256+ GPUs — most neoclouds gate multi-GPU cluster pricing behind a sales call. Only multi-year reserved capacity is quoted.

2. No egress, per-minute billing. Unlike hyperscalers, Lambda charges zero data-egress fees and bills GPUs by the minute, which removes two of the most unpredictable line items in cloud GPU bills.

3. Prices that rise, not fall. Lambda’s on-demand rates went up through 2025-2026 — a deliberate signal that scarce frontier GPUs are a sellers’ market, the opposite of the token-deflation seen on the inference side.

Strengths & weaknesses

Strengths	Weaknesses
Transparent per-GPU-hour pricing for both on-demand and clusters	On-demand prices rising, not falling
No egress fees; clean per-minute billing	Frequent capacity sell-outs for popular GPUs
Historically among the cheapest H100/A100 clouds	Limited regions (mostly US; no EU/Asia zones)
Pre-configured Lambda Stack, research-friendly	No free tier; orphaned storage keeps billing
Pre-installed ML software, fast launch	Managed Inference API being wound down

Billing UX : Lambda billing controls and transparency

Billing controls — Pay-per-minute on-demand; commit-term clusters (2 weeks to 1 year) lock a lower per-GPU rate. Multi-year reserved capacity is quoted via sales.
Usage visibility — Self-serve console shows running instances and attached storage; the recurring user complaint is GPU availability rather than billing opacity.
Payment options — Self-serve checkout for on-demand and 1-Click Clusters; sales-led contracts and invoicing for reserved/private capacity and enterprise.

Strategic wins : Why Lambda’s pricing decisions worked

1. Undercutting hyperscalers on a transparent rate card

By publishing per-GPU-hour rates well below the $2-11/hr hyperscalers charge for an H100, Lambda turned price transparency into a wedge for the entire AI-research segment. See how AI companies structure pricing.

2. Pricing up into scarcity instead of racing to the bottom

Rather than chase peer-to-peer marketplaces down to sub-$2/hr, Lambda raised on-demand rates as demand surged — capturing margin from buyers who value reliability and a pre-built stack over the absolute lowest price. Related: outcome-based pricing trends.

3. Layering committed clusters on top of on-demand

The 1-Click Cluster tier converts spiky on-demand demand into predictable 2-week-to-1-year commitments at a discount, smoothing capacity planning while feeding the Microsoft-scale buildout. See choosing the right usage metric.

Areas to improve : Gaps in Lambda’s pricing approach

1. Capacity, not price, is the real bottleneck

The most common complaint isn’t cost — it’s that the GPU you want is sold out when you want it. A self-serve rate card means little if launches fail; transparent real-time availability would help. See bill shock and cost unpredictability.

2. Storage billing traps

Persistent storage at ~$0.20/GiB/month keeps billing while instances are stopped, and orphaned volumes silently accrue cost. Clearer in-console warnings and auto-cleanup defaults would reduce surprise charges.

3. Rising rates erode the cheapest-cloud reputation

As on-demand H100 climbs toward $4/GPU/hr, RunPod and marketplaces like Vast.ai look more attractive on raw price. Lambda needs to keep justifying the premium with reliability and tooling rather than price alone.

Monetization stack & signals : how Lambda builds & buys its revenue engine

Buys 1 Builds 1 3 open roles

The read — where the monetization investment is going

Lambda builds its own meter — docs describe bespoke per-minute GPU billing, weekly usage invoices, and a proprietary service-credits balance — but buys the back office, with NetSuite its stated ERP/rev-rec system. Revenue-org hiring is account- and accounting-facing, pointing investment at enterprise coverage rather than a billing-platform team.

Stack — build vs buy

Builds in-house · 1

In-house usage metering & billing Metering inferred Docs Jun 2026

“ODC prices instances by hourly usage and bills in one-minute increments. Billing begins the moment you launch an instance and the instance passes health checks, and ends the moment you terminate the instance. You receive weekly invoices for the previous week's usage. Lambda issues refunds in the form of service credits toward future Lambda Cloud usage. No third-party billing vendor (Stripe, Metronome, Orb) is named.”

Buys (vendor) · 1

NetSuite Revenue recognition Job post 1 Job post 2 Job post 3 Jun 2026

“Lead the deployment of Workday Payroll... with full SOX-compliant controls over pay calculation, tax filing, GL posting to NetSuite, and off-cycle processing.”

Unconfirmed · 2

Stripe Payments inferred Job post Jun 2026

“Nice to Have: Netsuite, Salesforce, Coupa, Stripe, AI experience.”
Salesforce CRM inferred Job post Jun 2026

“Nice to Have: Netsuite, Salesforce, Coupa, Stripe, AI experience.”

Open roles in the revenue & lifecycle org — 3

View open roles

Senior Manager, Revenue Technical Accounting RevOps Jun 17, 2026
Account CTO Retention Jun 3, 2026
Account Director - Superintelligence Customer success seen Apr 1, 2026

Signals reviewed Jun 2026 · derived from public job posts, product docs

Job postings fill and close over time — once a posting is filled we keep it as a dated citation (the quoted evidence remains); use View open roles for current listings.

Key takeaways

Lambda is pure per-GPU-hour usage pricing — per-minute billing, no egress, no free tier — across on-demand, committed clusters, and reserved capacity. For the underlying model, see the introduction to usage-based pricing.
It bucked GPU price deflation by raising rates (H100 SXM $2.99 → $3.99 in a year) as frontier demand outran supply.
Cluster pricing is published, not gated — a transparency edge over most neoclouds, with only multi-year reserved deals sales-quoted.
Capacity and orphaned storage are the real hidden costs, not headline fees.
The category’s monetization is bifurcating: raw compute prices up (scarcity), managed inference prices down (token deflation) — Lambda is leaning into the former and retiring its Inference API.

UBP implications

When supply is the constraint, usage pricing can move up. Lambda shows a pure-usage vendor can raise unit rates without churning customers when the underlying resource is genuinely scarce.
Transparency is a differentiator even in commodity infra. Publishing cluster rates others hide behind sales calls lowers buyer friction and builds trust.
Commit tiers convert volatility into predictability. Layering 2-week-to-1-year cluster commitments over on-demand gives both sides forecastability — a reusable pattern for any usage-based business with capacity to plan.

Sources

Lambda AI cloud pricing (accessed 2026-06-09)
Lambda GPU Cloud (accessed 2026-06-09)
Lambda Inference (accessed 2026-06-09; Inference API winding down)
TechCrunch — Lambda raises $1.5B after Microsoft deal (accessed 2026-06-09)
Sacra — Lambda Labs revenue, valuation & funding (accessed 2026-06-09)
Wayback Machine snapshots, June 2025–Jan 2026 (tools/wayback-index/lambda-labs.json) (accessed 2026-06-09)

Bottom line

Lambda (Lambda Labs) is one of the clearest examples of a pure usage-based GPU cloud: published per-GPU-hour rates, per-minute billing, no egress, and committed-cluster discounts layered on top of self-serve on-demand. What makes it unusual is the direction — Lambda raised on-demand prices through 2025-2026 as Microsoft-scale demand and a multibillion-dollar war chest met scarce frontier GPUs, the opposite of the token-cost deflation playing out in inference. The real costs to watch are capacity sell-outs and orphaned storage, not the rate card. Browse the pricing blueprint for more fully-researched company profiles, or compare Lambda against other Infrastructure, Compute & MLOps companies.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

On-demand rates raised across the board

Jun 2026

June 2026 list pricing: B200 SXM $6.69 (from $4.99), H100 SXM $3.99 (from $2.99), H100 PCIe $3.29, GH200 $2.29, A100 SXM 80GB $2.79 (from $1.79), A100 40GB $1.99, V100 $0.79. 1-Click Clusters: B200 $8.87-$9.86, H100 $5.54-$6.16/GPU/hr.

B200 SXM listed at $4.99; H100 SXM steady at $2.99

Jan 2026

Early-2026 list pricing: B200 SXM6 on-demand $4.99/GPU/hr, H100 SXM $2.99, A100 80GB $1.79, A100 40GB $1.29, V100 $0.55.

H100 SXM on-demand at $2.99/GPU/hr; A100 80GB $1.79

Jun 2025

Mid-2025 list pricing: on-demand 8x H100 SXM at $2.99/GPU/hr, A100 SXM 80GB at $1.79, A100 40GB $1.29, V100 $0.55. B200 launching as low as $2.99 with multi-year commitment.

Trivia

· Lambda started in 2012 building a facial-recognition API for Google Glass, then pivoted to selling deep-learning workstations before becoming a GPU cloud.
· Despite the GPU price-deflation narrative, Lambda RAISED its on-demand H100 SXM rate from $2.99 to $3.99/GPU/hr between 2025 and 2026 as demand outstripped capacity.
· Lambda signed a multibillion-dollar deal to supply Microsoft with AI infrastructure, then raised over $1.5B in late 2025 at a reported $4-5B valuation, with an IPO rumored for H2 2026.

Questions & answers

How does Lambda's GPU cloud pricing work?: Lambda rents NVIDIA GPUs by the GPU-hour, billed per minute. On-demand instances are self-serve and first-come; 1-Click Clusters and Superclusters run on 2-week-to-1-year commitments at lower per-GPU rates; multi-year reserved capacity is sales-quoted. There are no egress fees, and persistent storage is billed separately.
How much does an H100 cost on Lambda?: As of June 2026, an on-demand NVIDIA H100 SXM (80GB) lists at $3.99/GPU/hr and an H100 PCIe at $3.29/GPU/hr. In a committed 1-Click Cluster, H100 systems run roughly $5.54-$6.16/GPU/hr depending on cluster size. H100 SXM on-demand was $2.99/GPU/hr through 2025 before rising in 2026.
Does Lambda have a free tier?: No. Lambda has no free GPU tier. You pay per-minute for any instance you launch, and persistent storage continues billing (about $0.20/GiB/month) even when an instance is stopped, so orphaned volumes are a real hidden cost.
Is Lambda cheaper than AWS, CoreWeave, or RunPod?: Lambda has historically been one of the cheapest H100/A100 clouds, undercutting hyperscalers (which charge roughly $2-11/hr for an H100) and often beating CoreWeave. RunPod and peer-to-peer marketplaces like Vast.ai can be cheaper per hour, but Lambda is favored for its research-friendly, pre-configured stack. The main trade-off is capacity: popular GPUs sell out during peak demand.
What happened to the Lambda Inference API?: Lambda is winding down its hosted Inference API, which previously charged per million tokens (roughly $0.02 for small 3B models up to about $0.90 for 405B models). Lambda now steers inference workloads onto its raw GPU instances and clusters instead of the managed token endpoint.