All companies
technology

Hyperbolic pricing

hyperbolic.ai facts checked analysis reviewed
Quick summary
Region
Product
GPU cloud marketplace & serverless AI inference
Industry
technology
Commits
Available (annual)
In this page
AI Summary
  • Hyperbolic is an open-access AI cloud with two pure usage-based surfaces and no subscription tiers: a GPU marketplace and a serverless inference API.
  • GPU Marketplace (June 2026, on-demand starting rates): H100 SXM $1.50, H200 $2.40, B200 $3.50, RTX 4090 $0.30, RTX 3070 $0.16 per GPU-hour — refreshed weekly from supplier rates.
  • Serverless Inference is billed per million tokens: Llama-3.1-8B and Llama-3.2-3B at $0.10, Qwen2.5-Coder-32B $0.20, 70B-class models $0.40, DeepSeek-V2.5 $2.00, Llama-3.1-405B $4.00.
  • New users get $5 in free credits; GPU rental requires depositing at least $5. Payment is by card/Stripe, wire/ACH, or crypto (USDC).
  • Inference tiers gate request rate (Basic 60 RPM, Pro 600 RPM, Enterprise unlimited) but not token price; dedicated single-tenant hosting is sales-quoted hourly by GPU type.
Pricing summary
Hyperbolic 2026 — two pay-as-you-go surfaces
Pure usage-based: rent GPUs by the hour, or call open models priced per million tokens. No subscription tiers.
Serverless Inference
$0.10 /1M tokens
Developers calling open models (Llama, Qwen, DeepSeek) via an OpenAI-compatible API
On-demand starting rates as of June 2026 (hyperbolic.ai/marketplace and /inference). Marketplace GPU rates are refreshed weekly from supplier rates and may vary; verify current rates before committing.

About

Hyperbolic (Hyperbolic Labs) is a San Francisco-based “open-access AI cloud” founded in 2023 by Jasper Zhang (CEO) and Yuchen Jin (CTO). It sells two distinct products on a pure pay-as-you-go basis: a GPU Marketplace for renting raw GPU capacity by the hour (NVIDIA H100 SXM, H200, B200, RTX 4090, RTX 3070 and others), and a Serverless Inference API that runs open-weight models (Llama, Qwen, DeepSeek and more) billed per million tokens. Both surfaces are publicly priced with no required subscription, seat minimum, or contract — buyers add credit and draw it down by usage.

Hyperbolic’s distinguishing idea is a DePIN-style supply model: rather than build out its own data centers, it aggregates underused GPU capacity from third-party data centers and operators and resells it, which lets it list aggressive per-hour rates and refresh them weekly as supply shifts. The company raised roughly $20M total — a seed round around $7M in July 2024 and a $12M Series A in December 2024 led by Variant and Polychain Capital. It reports 200,000+ engineers on the platform and 25+ open-source models served via API, and it is one of Hugging Face’s third-party inference providers. Customers and references include Hugging Face, Quora, Cornell, UC Berkeley, the LMSYS Chatbot Arena, and Reve AI.

For current pricing, see the GPU marketplace and serverless inference pages. Hyperbolic sits in the AI infrastructure and compute category alongside GPU-cloud rivals and open-model inference hosts.


Pricing summary : GPU-hours and per-million-token usage billing

Hyperbolic is pure usage-based across two surfaces with two different value metrics. There are no subscription tiers, seats, or platform fees — you pay only for the compute you consume, drawing down prepaid credit.

  1. GPU Marketplace — billed per GPU-hour. On-demand starting rates (June 2026): H100 SXM $1.50, H200 $2.40, B200 $3.50, RTX 4090 $0.30, RTX 3070 $0.16 per GPU-hour. Hyperbolic states marketplace rates are refreshed weekly based on the best available supplier rates, so per-hour figures are dynamic rather than a fixed list price. Reserved clusters for long-running jobs are arranged via sales.
  2. Serverless Inference — billed per million tokens against open-weight models. Each model carries its own rate, from $0.10/M for small Llama models up to $4.00/M for Llama-3.1-405B.

New users start with $5 in free credits to explore inference; GPU rental requires depositing at least $5. Payment is by credit card/Stripe, wire/ACH, or crypto (USDC).

What makes this different: the two value metrics target two buyers from one account — infrastructure teams that want raw GPU-hours for training and custom serving, and developers that want a per-token serverless API without managing GPUs. The marketplace’s weekly-refreshed, supplier-driven rates behave like a spot/marketplace price, not the fixed on-demand list price of a traditional cloud.


Pricing by product

Hyperbolic has two product surfaces, each with its own value metric. On-demand starting rates, per GPU-hour, as of June 2026 (rates refreshed weekly):

GPU Marketplace (per GPU-hour)

GPU (on-demand)Starting priceBest for
NVIDIA B200$3.50 /GPU-hrNewest-generation Blackwell training
NVIDIA H200$2.40 /GPU-hrLarge-memory LLM training/serving
NVIDIA H100 SXM$1.50 /GPU-hrMainstream LLM training & fine-tuning
NVIDIA RTX 4090$0.30 /GPU-hrCost-sensitive inference / dev
NVIDIA RTX 3070$0.16 /GPU-hrLight / budget workloads

Rates refreshed weekly from the best available supplier rates, so the per-hour price is dynamic. The catalog also lists RTX 3080-class cards, and supports clusters of 1–2,048 GPUs with InfiniBand and high-performance storage. Reserved clusters for guaranteed capacity are sales-quoted.

Serverless Inference (per million tokens)

ModelPriceKey mechanics
Llama-3.2-3B / Llama-3.1-8B$0.10 /1M tokensSmallest, cheapest open models
Qwen2.5-Coder-32B$0.20 /1M tokensCode-specialized mid-size model
Llama-3.1-70B / Qwen2.5-72B / Hermes-3-70B$0.40 /1M tokens70B-class general models
DeepSeek-V2.5$2.00 /1M tokensLarger reasoning/MoE model
Llama-3.1-405B$4.00 /1M tokensFrontier-scale open model

Each model has its own per-million-token rate; larger models cost more per token. Image (SDXL, FLUX), VLM, and audio modalities are also served, priced per-generation rather than per-token (the per-image rate is not published on the pricing page). Inference tiers gate request rate — Basic 60 RPM, Pro 600 RPM, Enterprise unlimited — but not the token price.

Sales motions across products: self-serve / PLG for both the marketplace and serverless inference (add credit, consume by usage); sales-led for reserved clusters and dedicated single-tenant hosting.


Hidden costs : What Hyperbolic users actually pay

Hyperbolic’s headline rates are clean pay-as-you-go, but a few items shape the real bill:

Line itemCost
GPU-hour (e.g. 8x H100 SXM)$1.50/GPU/hr → ~$12.00/hr for the node
Inference tokensPer-model, $0.10–$4.00 /1M tokens
Minimum GPU depositMust deposit $5 before renting GPUs
Weekly rate driftMarketplace per-hour price can move week to week
Reserved clusters / dedicated hostingSales-quoted hourly by GPU type

Two real-world cost drivers stand out. First, the marketplace price is a moving target: because rates refresh weekly off supplier availability, the per-hour rate you budgeted can shift before your next long job — predictable for short bursts, less so for multi-week training. Second, because supply is aggregated from third parties, capacity and reliability can vary by GPU type and region; teams that need guaranteed uptime are steered to reserved clusters or dedicated single-tenant hosting, which are quoted by sales rather than self-serve. Hyperbolic does offset some risk with “no charge for failed instances” billing — you only pay for GPUs that come online.

Want to estimate your own Hyperbolic bill? Use the Hyperbolic pricing calculator to model your costs based on GPU type, hours, and token volume.


Pricing evolution : Hyperbolic pricing history and changes

Cadence

PeriodPrice changesProduct / SKU additionsNotes
2024 H2GPU marketplace + serverless inference liveH100 advertised from ~$0.99/hr post-Series A
2025Per-model inference rate card stabilizedImage (SDXL, FLUX), VLM, audio modalities added70B-class at $0.40/M; 405B at $4.00/M
2026 Q2Marketplace starting rates publishedReserved clusters, dedicated hosting, crypto payH100 from $1.50/hr; weekly supplier-rate refresh

Tracked range: 2024–present. Marketplace rates are dynamic (weekly supplier-rate refresh), so point-in-time figures reflect the capture date.

Notable changes

  • Late 2024 — After a $12M Series A (Dec 2024, led by Variant and Polychain), both surfaces were live; early marketing advertised H100 rental from roughly $0.99/hr and per-million-token open-model inference.
  • 2025 — Inference settled into a per-model rate card (3B–8B Llama at $0.10/M, 70B-class at $0.40/M, DeepSeek-V2.5 at $2.00/M, Llama-3.1-405B at $4.00/M), with image, VLM, and audio modalities added.
  • June 2026 — Marketplace published on-demand starting rates (H100 SXM $1.50, H200 $2.40, B200 $3.50, RTX 4090 $0.30, RTX 3070 $0.16), refreshed weekly from supplier rates; crypto (USDC) payment and a new-user $5 free credit are in place.

The direction of travel is a maturing two-sided model: a spot-style GPU marketplace whose per-hour rate floats with supplier supply, layered alongside a stable per-token inference rate card.


What’s unique : Hyperbolic’s distinctive pricing mechanics

1. Two value metrics, one account. Hyperbolic prices GPU-hours on the marketplace and per-million-tokens on serverless inference from a single prepaid balance — serving infra teams and API developers without forcing either into the other’s billing model.

2. Weekly-refreshed, supplier-driven marketplace rates. Because it aggregates third-party GPU supply (a DePIN model), Hyperbolic refreshes per-hour rates weekly off the best available supplier prices — a spot/marketplace price rather than a fixed list price, which is unusual for raw-GPU rental.

3. Published rates plus crypto payment. Hyperbolic publishes both per-GPU-hour and per-model token rates openly (no sales call to see numbers) and accepts crypto (USDC) alongside card and wire — rare in the GPU-cloud category and aligned with its open-access positioning.


Strengths & weaknesses

StrengthsWeaknesses
Transparent per-GPU-hour and per-token rates, published openlyMarketplace rate drifts weekly — harder to budget long jobs
Aggressive starting prices (H100 from $1.50/hr)Aggregated third-party supply can vary in capacity/reliability
Two value metrics from one prepaid accountReserved/dedicated capacity is sales-quoted, not self-serve
$5 free credit + flexible payment (card, wire, crypto)$5 minimum deposit required before GPU rental
OpenAI-compatible API, zero data retention on inferenceImage/audio per-generation rates not published on pricing page

Billing UX : pay-as-you-go credit, dual metering, public list pricing

  • Pay-as-you-go credit, no subscription — both surfaces draw down a prepaid balance by usage; there is no monthly platform fee, seat count, or required commitment to start. New users get a $5 credit; GPU rental needs a $5 minimum deposit.
  • Two metered dimensions — GPU-hours on the marketplace and tokens (per million) on serverless inference are metered and billed separately, so a single account can run both meters concurrently.
  • Public list pricing — per-GPU-hour and per-model token rates are published rather than gated behind a sales call; only reserved clusters and dedicated hosting are quoted.
  • Failure-aware billing — Hyperbolic does not charge for failed instances and notifies within a few minutes if an instance fails, so you pay only for GPUs that come online.
  • Flexible payment — credit card/Stripe and pay-as-you-go, wire/ACH upfront or monthly, and crypto (USDC).

Strategic wins : Why Hyperbolic’s pricing decisions worked

1. Aggregating idle supply into a transparent rate card

By reselling underused third-party GPU capacity at openly published, weekly-refreshed rates, Hyperbolic undercuts hyperscalers on raw price while keeping pricing visible — a wedge into the price-sensitive AI-research and indie-developer segment. See how AI companies structure pricing.

2. Two value metrics that capture two buyers

Pricing GPU-hours for infra teams and per-million-tokens for API developers from one account lets Hyperbolic monetize both the “I want raw compute” and the “I just want a model endpoint” buyer without making either adopt the other’s mental model. Related: outcome-based pricing trends.

3. A stable token rate card on top of a spot GPU market

Layering a fixed per-model inference rate card over a fluctuating spot GPU marketplace gives developers predictability where they want it (token price) while letting raw compute float with supply. See choosing the right usage metric.


Areas to improve : Gaps in Hyperbolic’s pricing approach

1. Weekly rate drift hurts long-job budgeting

A per-hour rate that refreshes weekly is fine for short bursts but awkward for multi-week training runs. Clearer rate-lock options (beyond sales-quoted reserved clusters) would make the marketplace easier to budget. See bill shock and cost unpredictability.

2. Unpublished image/audio rates

Inference token rates are published, but per-generation image (SDXL, FLUX) and audio rates are not shown on the pricing page, forcing buyers into docs or a console to learn cost. Publishing them would extend the transparency that benefits the token products.

3. Capacity and reliability transparency

Because supply is aggregated from third parties, capacity and reliability can vary by GPU and region. Real-time availability and SLA clarity (without a sales call) would reduce the gap between a self-serve rate card and a self-serve experience.


Key takeaways

  1. Hyperbolic is pure usage-based across two surfaces — per-GPU-hour on the marketplace and per-million-tokens on serverless inference, both pay-as-you-go with no subscription. For the underlying model, see the introduction to usage-based pricing.
  2. It aggregates third-party GPU supply (DePIN) and refreshes marketplace rates weekly, so the per-hour price is a spot/marketplace rate, not a fixed list price.
  3. Both rate cards are published openly — H100 from $1.50/hr and per-model token rates from $0.10/M — a transparency edge over GPU clouds that gate numbers behind sales.
  4. The real frictions are rate drift and supply variability, not headline fees; reliability-sensitive teams move to sales-quoted reserved or dedicated capacity.
  5. Two value metrics serve two buyers from one account, with crypto payment and a $5 free credit reinforcing its open-access positioning.

UBP implications

  1. A spot meter and a fixed meter can coexist. Hyperbolic floats GPU-hour prices weekly while holding token prices steady — pricing each metric the way its supply behaves, a reusable pattern for any business with both volatile and stable cost inputs.
  2. Two value metrics widen the addressable buyer set. Offering raw GPU-hours and a per-token API from one account lets a vendor monetize both the infrastructure buyer and the application developer without forcing a single billing model.
  3. Transparency is a wedge even in commodity infra. Publishing per-hour and per-token rates (plus crypto payment) lowers buyer friction and differentiates against rivals that hide pricing behind sales calls.

Sources


Bottom line

Hyperbolic is a clean two-sided example of pure usage-based pricing for AI compute: a GPU marketplace billed per GPU-hour (H100 from $1.50/hr, refreshed weekly off aggregated third-party supply) alongside a serverless inference API billed per million tokens ($0.10–$4.00/M across open models). Both rate cards are published openly, payment includes crypto, and new users get a $5 credit — all reinforcing an open-access positioning. The trade-offs are a marketplace price that drifts weekly and supply that varies by GPU and region, which steers reliability-sensitive teams toward sales-quoted reserved or dedicated capacity. Browse the pricing blueprint for more fully-researched company profiles, or compare Hyperbolic against other AI infrastructure and compute companies.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Marketplace starting rates published; H100 from $1.50/hr

GPU marketplace shows on-demand starting rates: H100 SXM $1.50, H200 $2.40, B200 $3.50, RTX 4090 $0.30, RTX 3070 $0.16 per GPU-hour, refreshed weekly from supplier rates. Inference rate card unchanged. New-user $5 free credit; crypto (USDC) payment supported.

Per-million-token inference rate card stabilizes

Inference settled into a per-model rate card: $0.10/M for 3B-8B Llama, $0.40/M for 70B-class models, $2.00/M for DeepSeek-V2.5, and $4.00/M for Llama-3.1-405B, with image (SDXL, FLUX) and audio modalities added.

Two usage surfaces live; H100 advertised from ~$0.99/hr

After its $12M Series A (Dec 2024), Hyperbolic offered both a GPU marketplace and a serverless inference API. Early marketing cited H100 rental from roughly $0.99/hr, with open-model inference billed per million tokens.

Trivia
  • · Hyperbolic prices two different value metrics from one account: GPU-hours on its marketplace and per-million-tokens on serverless inference.
  • · It runs a DePIN-style model — aggregating underused GPU capacity from data centers and operators — and accepts payment in crypto (USDC) alongside card and wire.
  • · Marketplace GPU rates are refreshed weekly based on the best available supplier rates, so the per-hour price is a spot/marketplace rate rather than a fixed list price.

Questions & answers

What is Hyperbolic's pricing model?
Pure usage-based, pay-as-you-go. Hyperbolic bills per GPU-hour on its GPU marketplace and per million tokens on its serverless inference API. There are no subscription tiers or seat fees — you add credit and draw it down by consumption. Reserved clusters and dedicated hosting are sales-quoted.
How much does an H100 cost on Hyperbolic?
As of June 2026, an NVIDIA H100 SXM on the marketplace starts at $1.50/GPU/hr. H200 starts at $2.40, B200 at $3.50, RTX 4090 at $0.30, and RTX 3070 at $0.16 per GPU-hour. Rates are refreshed weekly based on the best available supplier rates, so the per-hour price is dynamic rather than a fixed list price. Earlier in 2025 Hyperbolic advertised H100 from around $0.99/hr.
How is Hyperbolic's inference pricing charged?
Serverless inference is billed per million tokens, with each open-weight model carrying its own rate: $0.10/M for Llama-3.1-8B and Llama-3.2-3B, $0.20/M for Qwen2.5-Coder-32B, $0.40/M for 70B-class models like Llama-3.1-70B and Qwen2.5-72B, $2.00/M for DeepSeek-V2.5, and $4.00/M for Llama-3.1-405B. You pay only for the tokens you consume.
Does Hyperbolic have a free tier?
New users get $5 in free credits to explore the inference models. A separate $1 credit cannot be used to rent GPUs — you must deposit at least $5 before launching a GPU instance. There is also a referral program: refer a friend who tops up $5 within 14 days and you get $5 in credit while they get $6.