AI Summary
About
Lambda — long known as Lambda Labs — is a GPU cloud and AI compute infrastructure provider founded in 2012 by brothers Stephen and Michael Balaban. The company started life with an unlikely product: a facial-recognition API for Google Glass. By 2017 it had pivoted into hardware, selling GPU laptops and deep-learning workstations to AI researchers, and from there into renting that same NVIDIA compute over the cloud. Today Lambda rents NVIDIA GPUs by the GPU-hour to roughly 200,000+ AI developers and teams, with its instances pre-loaded with “Lambda Stack” (CUDA, PyTorch, TensorFlow) so customers can train in minutes.
Lambda has become one of the higher-profile “neoclouds” — specialized GPU-cloud providers competing with hyperscalers and rivals like CoreWeave and RunPod. Its growth has been steep: third-party research puts revenue at roughly $250M in 2023, around $425M by end of 2024, and on the order of $760M in 2025. In November 2025 Lambda announced a multibillion-dollar deal to supply Microsoft with AI infrastructure built on tens of thousands of NVIDIA GPUs, then raised over $1.5B in a round led by TWG Global at a reported valuation in the $4-5B range. Lambda has also been reported in talks for pre-IPO financing with an IPO targeted for the second half of 2026.
For current pricing, see Lambda’s pricing page. Note the company rebranded its primary domain from lambdalabs.com to lambda.ai.
Pricing summary : How Lambda’s pricing model works
Lambda is pure usage-based: you pay per GPU-hour for the instances you launch, billed in per-minute increments, with no egress fees. There is no free tier and no monthly subscription — the meter runs whenever a GPU is allocated to you. Pricing splits across three surfaces:
- On-Demand Instances — self-serve, first-come 1/2/4/8-GPU configs of B200, H100, A100, GH200, and older cards. List rates are published openly per GPU-hour.
- 1-Click Clusters & Superclusters — production clusters of 16 to 2,000+ interconnected B200 or H100 GPUs on 2-week-to-1-year commitments, at lower per-GPU rates than on-demand.
- Reserved / Private capacity — multi-year reserved capacity at Lambda’s lowest prices, quoted by sales (“talk to our team”).
What makes this different: Lambda publishes hard per-GPU-hour numbers for both on-demand and committed clusters — unusual transparency for a category where rivals often gate cluster pricing behind sales calls. The catch is the inverse of most AI infra: instead of prices deflating, Lambda has been raising on-demand rates as frontier-model demand outpaces GPU supply.
Pricing by product
On-demand list prices, per GPU-hour, as of June 2026 (excludes sales tax):
| GPU (on-demand) | VRAM | Price/GPU/hr | Best for |
|---|---|---|---|
| NVIDIA B200 SXM6 | 180 GB | $6.69 | Frontier training |
| NVIDIA H100 SXM | 80 GB | $3.99 | Mainstream LLM training |
| NVIDIA A100 SXM | 80 GB | $2.79 | Cost-efficient training |
| NVIDIA A100 SXM | 40 GB | $1.99 | Budget training |
| NVIDIA Tesla V100 | 16 GB | $0.79 | Light / legacy workloads |
1-Click Clusters (committed, 2 weeks – 1 year), per GPU-hour:
| Cluster | 16 GPUs | 64 GPUs | 256+ GPUs |
|---|---|---|---|
| NVIDIA HGX B200 | $9.86 | $9.36 | $8.87 |
| NVIDIA H100 | sales-quoted | sales-quoted | sales-quoted |
Sales motions across products: on-demand instances are fully self-serve (PLG); 1-Click Clusters are self-serve to launch but steer larger and multi-year commitments to “talk to our team” (sales-led reserved). Lambda is winding down its managed per-token Inference API in favor of raw GPU instances and clusters.
Hidden costs : What Lambda users actually pay
Lambda’s headline rates are clean (no egress, per-minute billing), but real bills include a few items beyond the GPU-hour:
| Line item | Cost |
|---|---|
| GPU-hour (e.g. 8x H100 SXM) | $3.99/GPU/hr → ~$31.92/hr for the node |
| Persistent storage | ~$0.20/GiB/month, billed hourly — continues even when unmounted |
| Egress / data transfer | $0 (no egress fees) |
| Idle/stopped storage | Storage keeps billing while instances are stopped |
| Sales tax | Added on top where applicable |
The single biggest real-world cost driver isn’t a fee — it’s capacity. Reviewers consistently report popular GPUs (especially H100 configs) selling out during peak demand; one user described a 26-hour “temporarily unavailable” wall when scaling from 2 to 4 GPUs. The second is orphaned storage: persistent volumes keep billing at roughly $0.20/GiB/month even after you stop the instance, so cleanup matters.
The retired Inference API previously billed per million tokens (roughly $0.02 for a small 3B model up to about $0.90 for a 405B model, unified input/output) — relevant only as a historical reference now.
Want to estimate your own Lambda bill? Use the Lambda pricing calculator to model your costs based on GPU type and hours.
Pricing evolution : Lambda pricing history and changes
Cadence
| Period | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2025 H1 | — | B200 launched on Lambda Cloud | H100 SXM $2.99, A100 80GB $1.79, V100 $0.55 |
| 2025 H2 → 2026 Q1 | B200 listed $4.99 | GH200 instances added | H100 SXM held at $2.99 |
| 2026 Q2 | Across-the-board increase | — | H100 SXM → $3.99, B200 → $6.69, A100 80GB → $2.79 |
Tracked range: 2025–present (Wayback snapshots from June 2025 onward; see tools/wayback-index/lambda-labs.json).
Notable changes
- Mid-2025 — On-demand 8x H100 SXM at $2.99/GPU/hr, A100 SXM 80GB at $1.79, A100 40GB $1.29, V100 $0.55. NVIDIA HGX B200 launching, advertised as low as $2.99/GPU/hr with multi-year commitment.
- Early 2026 — B200 SXM6 on-demand listed at $4.99/GPU/hr; H100 SXM unchanged at $2.99.
- June 2026 — Broad increase: B200 SXM $6.69 (up from $4.99), H100 SXM $3.99 (up from $2.99), A100 SXM 80GB $2.79 (up from $1.79), V100 $0.79. 1-Click Cluster pricing published openly: B200 $8.87–$9.86, H100 $5.54–$6.16/GPU/hr.
The direction of travel is the headline: Lambda raised on-demand rates roughly 30-50% across cards over a year, bucking the usual GPU-cost-deflation story as Microsoft-scale demand and a Series-E war chest tightened supply.
What’s unique : Lambda’s distinctive pricing mechanics
1. Published cluster pricing. Lambda lists hard per-GPU-hour numbers for 1-Click Clusters of up to 256+ GPUs — most neoclouds gate multi-GPU cluster pricing behind a sales call. Only multi-year reserved capacity is quoted.
2. No egress, per-minute billing. Unlike hyperscalers, Lambda charges zero data-egress fees and bills GPUs by the minute, which removes two of the most unpredictable line items in cloud GPU bills.
3. Prices that rise, not fall. Lambda’s on-demand rates went up through 2025-2026 — a deliberate signal that scarce frontier GPUs are a sellers’ market, the opposite of the token-deflation seen on the inference side.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| Transparent per-GPU-hour pricing for both on-demand and clusters | On-demand prices rising, not falling |
| No egress fees; clean per-minute billing | Frequent capacity sell-outs for popular GPUs |
| Historically among the cheapest H100/A100 clouds | Limited regions (mostly US; no EU/Asia zones) |
| Pre-configured Lambda Stack, research-friendly | No free tier; orphaned storage keeps billing |
| Pre-installed ML software, fast launch | Managed Inference API being wound down |
Billing UX : Lambda billing controls and transparency
- Billing controls — Pay-per-minute on-demand; commit-term clusters (2 weeks to 1 year) lock a lower per-GPU rate. Multi-year reserved capacity is quoted via sales.
- Usage visibility — Self-serve console shows running instances and attached storage; the recurring user complaint is GPU availability rather than billing opacity.
- Payment options — Self-serve checkout for on-demand and 1-Click Clusters; sales-led contracts and invoicing for reserved/private capacity and enterprise.
Strategic wins : Why Lambda’s pricing decisions worked
1. Undercutting hyperscalers on a transparent rate card
By publishing per-GPU-hour rates well below the $2-11/hr hyperscalers charge for an H100, Lambda turned price transparency into a wedge for the entire AI-research segment. See how AI companies structure pricing.
2. Pricing up into scarcity instead of racing to the bottom
Rather than chase peer-to-peer marketplaces down to sub-$2/hr, Lambda raised on-demand rates as demand surged — capturing margin from buyers who value reliability and a pre-built stack over the absolute lowest price. Related: outcome-based pricing trends.
3. Layering committed clusters on top of on-demand
The 1-Click Cluster tier converts spiky on-demand demand into predictable 2-week-to-1-year commitments at a discount, smoothing capacity planning while feeding the Microsoft-scale buildout. See choosing the right usage metric.
Areas to improve : Gaps in Lambda’s pricing approach
1. Capacity, not price, is the real bottleneck
The most common complaint isn’t cost — it’s that the GPU you want is sold out when you want it. A self-serve rate card means little if launches fail; transparent real-time availability would help. See bill shock and cost unpredictability.
2. Storage billing traps
Persistent storage at ~$0.20/GiB/month keeps billing while instances are stopped, and orphaned volumes silently accrue cost. Clearer in-console warnings and auto-cleanup defaults would reduce surprise charges.
3. Rising rates erode the cheapest-cloud reputation
As on-demand H100 climbs toward $4/GPU/hr, RunPod and marketplaces like Vast.ai look more attractive on raw price. Lambda needs to keep justifying the premium with reliability and tooling rather than price alone.
Key takeaways
- Lambda is pure per-GPU-hour usage pricing — per-minute billing, no egress, no free tier — across on-demand, committed clusters, and reserved capacity. For the underlying model, see the introduction to usage-based pricing.
- It bucked GPU price deflation by raising rates (H100 SXM $2.99 → $3.99 in a year) as frontier demand outran supply.
- Cluster pricing is published, not gated — a transparency edge over most neoclouds, with only multi-year reserved deals sales-quoted.
- Capacity and orphaned storage are the real hidden costs, not headline fees.
- The category’s monetization is bifurcating: raw compute prices up (scarcity), managed inference prices down (token deflation) — Lambda is leaning into the former and retiring its Inference API.
UBP implications
- When supply is the constraint, usage pricing can move up. Lambda shows a pure-usage vendor can raise unit rates without churning customers when the underlying resource is genuinely scarce.
- Transparency is a differentiator even in commodity infra. Publishing cluster rates others hide behind sales calls lowers buyer friction and builds trust.
- Commit tiers convert volatility into predictability. Layering 2-week-to-1-year cluster commitments over on-demand gives both sides forecastability — a reusable pattern for any usage-based business with capacity to plan.
Sources
- Lambda AI cloud pricing (accessed 2026-06-09)
- Lambda GPU Cloud (accessed 2026-06-09)
- Lambda Inference (accessed 2026-06-09; Inference API winding down)
- TechCrunch — Lambda raises $1.5B after Microsoft deal (accessed 2026-06-09)
- Sacra — Lambda Labs revenue, valuation & funding (accessed 2026-06-09)
- Wayback Machine snapshots, June 2025–Jan 2026 (
tools/wayback-index/lambda-labs.json) (accessed 2026-06-09)
Bottom line
Lambda (Lambda Labs) is one of the clearest examples of a pure usage-based GPU cloud: published per-GPU-hour rates, per-minute billing, no egress, and committed-cluster discounts layered on top of self-serve on-demand. What makes it unusual is the direction — Lambda raised on-demand prices through 2025-2026 as Microsoft-scale demand and a multibillion-dollar war chest met scarce frontier GPUs, the opposite of the token-cost deflation playing out in inference. The real costs to watch are capacity sell-outs and orphaned storage, not the rate card. Browse the pricing blueprint for more fully-researched company profiles, or compare Lambda against other Infrastructure, Compute & MLOps companies.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
On-demand rates raised across the board
June 2026 list pricing: B200 SXM $6.69 (from $4.99), H100 SXM $3.99 (from $2.99), H100 PCIe $3.29, GH200 $2.29, A100 SXM 80GB $2.79 (from $1.79), A100 40GB $1.99, V100 $0.79. 1-Click Clusters: B200 $8.87-$9.86, H100 $5.54-$6.16/GPU/hr.
B200 SXM listed at $4.99; H100 SXM steady at $2.99
Early-2026 list pricing: B200 SXM6 on-demand $4.99/GPU/hr, H100 SXM $2.99, A100 80GB $1.79, A100 40GB $1.29, V100 $0.55.
H100 SXM on-demand at $2.99/GPU/hr; A100 80GB $1.79
Mid-2025 list pricing: on-demand 8x H100 SXM at $2.99/GPU/hr, A100 SXM 80GB at $1.79, A100 40GB $1.29, V100 $0.55. B200 launching as low as $2.99 with multi-year commitment.
- · Lambda started in 2012 building a facial-recognition API for Google Glass, then pivoted to selling deep-learning workstations before becoming a GPU cloud.
- · Despite the GPU price-deflation narrative, Lambda RAISED its on-demand H100 SXM rate from $2.99 to $3.99/GPU/hr between 2025 and 2026 as demand outstripped capacity.
- · Lambda signed a multibillion-dollar deal to supply Microsoft with AI infrastructure, then raised over $1.5B in late 2025 at a reported $4-5B valuation, with an IPO rumored for H2 2026.
Questions & answers
- How does Lambda's GPU cloud pricing work?
- Lambda rents NVIDIA GPUs by the GPU-hour, billed per minute. On-demand instances are self-serve and first-come; 1-Click Clusters and Superclusters run on 2-week-to-1-year commitments at lower per-GPU rates; multi-year reserved capacity is sales-quoted. There are no egress fees, and persistent storage is billed separately.
- How much does an H100 cost on Lambda?
- As of June 2026, an on-demand NVIDIA H100 SXM (80GB) lists at $3.99/GPU/hr and an H100 PCIe at $3.29/GPU/hr. In a committed 1-Click Cluster, H100 systems run roughly $5.54-$6.16/GPU/hr depending on cluster size. H100 SXM on-demand was $2.99/GPU/hr through 2025 before rising in 2026.
- Does Lambda have a free tier?
- No. Lambda has no free GPU tier. You pay per-minute for any instance you launch, and persistent storage continues billing (about $0.20/GiB/month) even when an instance is stopped, so orphaned volumes are a real hidden cost.
- Is Lambda cheaper than AWS, CoreWeave, or RunPod?
- Lambda has historically been one of the cheapest H100/A100 clouds, undercutting hyperscalers (which charge roughly $2-11/hr for an H100) and often beating CoreWeave. RunPod and peer-to-peer marketplaces like Vast.ai can be cheaper per hour, but Lambda is favored for its research-friendly, pre-configured stack. The main trade-off is capacity: popular GPUs sell out during peak demand.
- What happened to the Lambda Inference API?
- Lambda is winding down its hosted Inference API, which previously charged per million tokens (roughly $0.02 for small 3B models up to about $0.90 for 405B models). Lambda now steers inference workloads onto its raw GPU instances and clusters instead of the managed token endpoint.