All companies
technology

OpenPipe pricing

docs.openpipe.ai facts checked analysis reviewed
Quick summary
Pricing model
Billing units
Region
Product
OpenPipe fine-tuning and hosted inference platform (small specialized models / RL for agents)
Industry
technology
Commits
None
In this page
AI Summary
  • OpenPipe prices fine-tuning and hosted inference on a pure usage-based model with no seats and no minimum commitment.
  • Training is billed per 1M dataset tokens scaled by model size ($0.48 for 8B-and-smaller up to $2.90 for 70B-and-larger), rates that were cut roughly 6x in early 2025 from $3.00–$16.00 — the most material change in its pricing history.
  • Hosted inference offers per-token rates (Llama 3.1 8B Instruct at $0.30 input / $0.45 output per 1M tokens; Llama 3.1 70B Instruct at $1.80 / $2.00) for high-volume models.
  • Lower-volume models are billed by hourly Compute Units from $1.50 to $12.00 per CU-hour, and third-party fine-tunes such as OpenAI GPT and Google Gemini pass through at provider rates with no OpenPipe markup.
  • CoreWeave agreed to acquire OpenPipe on September 3, 2025 to integrate its reinforcement-learning agent-training stack, and the published rate card stayed unchanged through the post-acquisition October 2025 docs snapshot.
  • Enterprise plans are contact-sales, adding volume discounts, on-premises deployment, custom SLAs, and reinforcement-learning engagements for production agents.
Pricing summary
OpenPipe 2026 — usage-based fine-tuning + hosted inference
Pure usage: pay per 1M training tokens (priced by model size) and per-token or per-CU-hour for inference — no seats, no minimum commitment.
Training
$0.48–$2.90 /1M tokens
Fine-tuning small specialized models on your data
Inference — Compute Units
$1.50–$12.00 /CU-hour
Experimental / lower-volume models
Enterprise
Custom
On-prem / VPC, RL for agents, compliance
Third-party fine-tunes (OpenAI GPT, Google Gemini) pass through at the provider's standard rates with no OpenPipe markup. Rates from docs.openpipe.ai/pricing/pricing.

About

OpenPipe is a fine-tuning and hosted-inference platform that helps engineering teams replace expensive general-purpose LLM prompts with small, specialized models trained on their own request data. Its supervised fine-tuning (SFT) workflow captures every production request/response, turns that traffic into a training dataset in minutes, and lets teams launch a state-of-the-art fine-tune in two clicks. The pitch is concrete: production teams see error rates drop sharply after swapping prompts for fine-tuned models, and fine-tuned Llama 3.1 models are positioned as roughly 8x cheaper than GPT-4o at comparable quality.

The company serves developers and AI product teams who want to own their model weights and deploy anywhere — cloud, edge, or on-prem — without the per-call cost of frontier APIs. Beyond SFT, OpenPipe has moved up-market into reinforcement learning for enterprise agents, offering consulting-style engagements that translate corporate KPIs into reward schemas and govern autonomous agents in production. Security and compliance (SOC 2 Type 2, HIPAA, GDPR) are first-class, reflecting an enterprise buyer who needs to fine-tune on sensitive data.

Founded in 2022 and Y Combinator-backed, the Seattle-area startup raised a $6.7M seed round in March 2024 (Costanoa Ventures, Y Combinator, and angels including GitHub co-founder Tom Preston-Werner). It is best known in the open-source community for ART (Agent Reinforcement Trainer), a GRPO-based RL toolkit with roughly 9k GitHub stars. On September 3, 2025, CoreWeave (NASDAQ: CRWV) announced a definitive agreement to acquire OpenPipe (terms undisclosed) to fold its agent-training and RL stack into CoreWeave’s AI cloud — a move that, as of the latest captures, left OpenPipe’s self-serve rate card intact.

OpenPipe sits in a competitive band alongside other model-customization and inference platforms, differentiating on the “log your traffic, click train” simplicity of its data-capture loop and on transparent, fully usage-based pricing rather than seat licenses or platform subscriptions.

Pricing summary : How OpenPipe’s per-token fine-tuning and inference rates work

OpenPipe runs a pure usage-based model with no seats and no minimum commitment. You pay across three independent meters: per-million-token training fees that scale with model size, per-token hosted inference for high-volume models, and per-CU-hour Compute Units for lower-volume models — plus zero-markup pass-through for third-party fine-tunes.

The model has these dimensions:

  • Training — billed per 1M dataset tokens, scaled by the size of the model being fine-tuned: $0.48 (8B and smaller), $1.50 (14B), $1.90 (32B), $2.90 (70B+).
  • Hosted inference, per-token — for popular high-volume models, billed per 1M input and output tokens with no minimum (e.g., Llama 3.1 8B Instruct at $0.30 in / $0.45 out).
  • Hosted inference, hourly Compute Units — for experimental/low-volume models, billed per CU-hour to the second, $1.50–$12.00 depending on model size.
  • Third-party models — OpenAI GPT and Google Gemini fine-tunes are passed through at the provider’s standard rates with no OpenPipe markup; you are billed directly by the provider.
  • Enterprise — contact-sales for volume discounts, on-prem/VPC deployment, custom SLAs, and the RL-for-agents engagement.

What makes this different: most fine-tuning platforms bill raw GPU time, but OpenPipe abstracts training cost into a single per-token rate indexed to model size, so a customer can estimate a fine-tune’s cost from dataset size alone — a usage-based approach well-suited to AI products. See how this compares across the corpus on token-based billing and pure-usage pricing.

Pricing by product

Fine-tuning (training)

TierPriceIncludedKey mechanics
8B and smaller$0.48 / 1M tokensTraining on datasets for models 8B parameters and underCheapest tier; cost indexed to dataset token count
14B models$1.50 / 1M tokensTraining for 14B-parameter models~3x the 8B rate
32B models$1.90 / 1M tokensTraining for 32B-parameter modelsMid-size band
70B+ models$2.90 / 1M tokensTraining for 70B-and-larger modelsTop training rate; ~6x the 8B rate

Hosted inference — per-token (high-volume models)

ModelInput (per 1M tokens)Output (per 1M tokens)Key mechanics
Llama 3.1 8B Instruct$0.30$0.45No minimum commitment; automatic scaling
Llama 3.1 70B Instruct$1.80$2.00Pay only for tokens processed

Hosted inference — hourly Compute Units (experimental / low-volume models)

ModelRate per CU-hourKey mechanics
Llama 3.1 8B$1.501 CU handles up to 24 req/sec; billed to the second
Mistral Nemo 12B$1.50Auto-scales when traffic exceeds capacity
Qwen 2.5 32B Coder$6.00CU stays active 60s after a traffic spike
Qwen 2.5 14B$6.00Designed for lower-volume / experimental workloads
Qwen 2.5 72B$12.00Largest-model band
Llama 3.1 70B$12.00Largest-model band

Third-party models (OpenAI, Gemini, etc.)

TierPriceIncludedKey mechanics
Third-party fine-tunesProvider’s standard rateOpenAI GPT, Google Gemini models via OpenPipeNo OpenPipe markup; billed by provider

Enterprise

TierPriceIncludedKey mechanics
EnterpriseCustomVolume discounts, on-prem deployment, dedicated support, custom SLAs, advanced security, more storageSales-led, quoted

Sales motions across products: PLG / self-serve for training, per-token and Compute-Unit inference; sales-led for the enterprise / RL-for-agents engagement.

Hidden costs : what a fine-tune-plus-serving workload actually bills

The per-token headline rates are easy to underestimate because a real deployment pays on two or three meters at once — a training run, then ongoing inference. Consider a team fine-tuning a Llama 3.1 8B model on a 50M-token dataset and then serving it at high volume.

Fine-tune-and-serve archetype (Llama 3.1 8B)

Line itemMonthly cost
Training: 50M dataset tokens × $0.48 / 1M (8B rate)$24
Inference input: 200M tokens × $0.30 / 1M$60
Inference output: 100M tokens × $0.45 / 1M$45
Total (first month, training + serving)$129

The lesson: training is a near-trivial one-time line item relative to ongoing inference, so the meter that dominates the bill is per-token serving — exactly where OpenPipe pitches an ~8x cost advantage over GPT-4o. Teams choosing the hourly Compute Unit path instead should model peak concurrency, since a single CU caps at 24 requests/second before auto-scaling adds more billable units. For background on building these estimates, see our guides on usage-based pricing fundamentals and tracking and metering usage events.

Want to estimate your own OpenPipe bill? Use the OpenPipe pricing calculator to model your monthly cost based on training tokens, per-token inference volume, and Compute-Unit concurrency.

Pricing evolution : a 6x training price cut, then the CoreWeave acquisition

Cadence

QuarterPrice changesProduct / SKU additionsNotes
2024 Q300Marketing page advertised a per-token Developer plan: $100 free credits, training from $4 / 1M tokens, inference $1.20 in / $1.60 out.
2025 Q101Pricing migrated to docs with model-size training tiers at $3.00 / $8.00 / $16.00 per 1M tokens; per-token + Compute-Unit inference in place.
2025 Q211~6x training cut (8B $3.00→$0.48; 70B+ $16.00→$2.90) and a new 14B tier ($1.50) appear by the 2025-04-20 snapshot.
2025 Q300CoreWeave announced acquisition of OpenPipe on 2025-09-03 (terms undisclosed); rate card unchanged in the post-acquisition Oct 2025 snapshot.

Tracked range: 2024-08 to 2026-06. Quarters not listed were verified stable (0 price changes, 0 SKU additions). The marketing page openpipe.ai/pricing collapsed to a near-empty 404 by the 2025-07-17 snapshot, with pricing thereafter living only in docs.

Notable changes

  • 2024-08 — Marketing page offered $100 free credits, training from $4 / 1M tokens, inference $1.20 in / $1.60 out (Wayback 2024-08-04 of openpipe.ai/pricing).
  • 2025-01 — Pricing moved to docs with model-size training at $3.00 (8B) / $8.00 (32B) / $16.00 (70B+) per 1M tokens (Wayback 2025-01-17).
  • 2025-04 — Training rates cut ~6x to $0.48 / $1.90 / $2.90 and a 14B tier ($1.50) added; inference unchanged (Wayback 2025-04-20, corroborated against the 2025-01 snapshot).
  • 2025-09-03CoreWeave announced a definitive agreement to acquire OpenPipe, terms undisclosed (TechCrunch).
  • 2025-10 — Post-acquisition docs snapshot showed the same rate card, confirming list pricing and self-serve access were not disrupted (Wayback 2025-10-08).

The CoreWeave acquisition in detail

On September 3, 2025, CoreWeave (NASDAQ: CRWV) announced it had agreed to acquire OpenPipe for undisclosed terms. CoreWeave framed the deal around OpenPipe’s open-source ART (Agent Reinforcement Trainer) toolkit and its reinforcement-learning approach to building production agents, positioning it as a vertical-integration play on top of CoreWeave’s earlier Weights & Biases acquisition. For buyers, the key pricing question was continuity — and the evidence says continuity held: the docs rate card captured on 2025-10-08, roughly a month after the announcement, was identical to the pre-announcement card ($0.48–$2.90 training, unchanged per-token and Compute-Unit inference). The most material structural change in OpenPipe’s pricing history therefore predates the acquisition: the ~6x training cut in early 2025, not the change of ownership.

What’s unique : model-size-indexed training and zero-markup pass-through

1. Training priced by model size, not GPU time. Instead of exposing raw GPU-hour billing, OpenPipe collapses training cost into a single per-1M-token rate indexed to the model’s parameter count ($0.48 to $2.90). A customer can forecast a fine-tune’s cost from dataset size alone — a notably more legible meter than the GPU-time billing common to other model-inference platforms.

2. Two inference meters for two workload shapes. High-volume models get pay-as-you-go per-token pricing with no minimum; experimental and low-volume models get serverless-style hourly Compute Units billed to the second. This lets buyers pick the meter that matches their traffic profile rather than forcing all workloads into one billing model.

3. Zero-markup third-party pass-through. OpenAI GPT and Gemini fine-tunes route through OpenPipe but bill directly from the provider at standard rates, so OpenPipe takes no margin on third-party tokens — a transparency choice that keeps it credible as a neutral fine-tuning layer.

4. Training prices that fell, not rose. Unlike many AI platforms that creep prices up, OpenPipe cut training rates roughly 6x in early 2025 (8B from $3.00 to $0.48 per 1M tokens; 70B+ from $16.00 to $2.90) and added a finer 14B tier — passing improving GPU economics straight to self-serve users rather than capturing the margin. That deflation, captured across the 2025-01 and 2025-04 docs snapshots, is the defining event in its pricing history, and it survived the CoreWeave acquisition intact.

Strengths & weaknesses

StrengthsWeaknesses
Fully transparent, public usage rates with no seats or minimumsMarketing pricing page (openpipe.ai/pricing) returns 404 — rates live only in docs
Training cost forecastable from dataset size (model-size-indexed)No published volume tiers or committed-use discounts (enterprise only)
Training got ~6x cheaper in early 2025, passing GPU economics to usersLimited per-token catalog (only Llama 3.1 8B/70B published)
Zero markup on third-party (OpenAI/Gemini) fine-tunesEnterprise / RL-for-agents pricing entirely opaque (contact-sales)
Strong open-source pull (ART toolkit ~9k stars; 955-point HN launch)CoreWeave acquisition (Sept 2025) adds long-term roadmap uncertainty

Billing UX : metered controls for tokens, compute units, and pass-through

OpenPipe’s billing surface is built around granular, self-serve usage meters with no seat management:

  • Per-1M-token training meter — training cost is computed from the dataset token count multiplied by a model-size rate ($0.48–$2.90), so spend is predictable before a run starts.
  • Per-token inference meter — high-volume models bill separately on input and output tokens (e.g., $0.30 in / $0.45 out for Llama 3.1 8B Instruct) with no minimum commitment and automatic infrastructure scaling.
  • Compute Unit (CU) metering — each CU absorbs up to 24 simultaneous requests per second, is billed to the second, auto-scales when traffic exceeds capacity, and remains active for 60 seconds after a traffic spike.
  • Third-party pass-through billing — for OpenAI/Gemini fine-tunes, OpenPipe forwards API calls and the provider bills you directly at standard rates, with no OpenPipe line item or markup.
  • Free entry + enterprise contact channel — teams can start fine-tuning for free, while enterprise pricing, volume discounts, and on-prem/VPC deployment are negotiated via hello@openpipe.ai.

Strategic wins : pricing choices that lower buyer friction

1. Model-size-indexed training makes cost predictable

By indexing training to model size rather than GPU time, OpenPipe gives buyers a number they can compute before committing — dataset tokens times a known rate. This removes the biggest source of fine-tuning sticker shock and aligns with the transparency principle that lowers buyer friction.

2. Dual inference meters capture both ends of the volume curve

Offering per-token billing for popular models and hourly Compute Units for experimental ones lets OpenPipe monetize a wider workload range without forcing a single pricing model. It mirrors the usage-based pricing models that successful infra platforms adopt.

3. Zero-markup pass-through builds trust as a neutral layer

Refusing margin on OpenAI and Gemini fine-tunes positions OpenPipe as a fine-tuning layer rather than a token reseller, which matters to buyers wary of opaque markups. This aligns price with the value metric that matters — OpenPipe charges for the fine-tuning it actually performs, not for brokering someone else’s tokens.

Areas to improve : gaps in transparency and self-serve depth

1. Fix the dead marketing pricing page

The public-facing openpipe.ai/pricing URL returns a 404, leaving rates discoverable only inside the docs. Restoring a marketing pricing page that mirrors the docs rate card would improve SEO discoverability and reduce the chance prospects assume pricing is gated — a pattern worth comparing against peers in the pricing blueprint.

2. Publish volume tiers and committed-use discounts

Today any discount lives behind an enterprise sales conversation. Publishing transparent volume breaks on training and per-token inference — as covered in our guide to thresholding and alerting — would let larger self-serve teams scale without a sales call.

3. Broaden the per-token inference catalog

Only Llama 3.1 8B and 70B have published per-token rates; every other model defaults to the coarser hourly Compute Unit meter. Expanding the per-token catalog to popular Qwen and Mistral models would give cost-sensitive, high-volume buyers a pay-as-you-go option instead of pushing them onto per-hour billing.

Key takeaways

  1. Index your most-feared meter to something buyers can pre-compute. OpenPipe converts GPU-time anxiety into a dataset-size multiplication, making training cost knowable before commitment.
  2. Match the meter to the workload shape. Offering both per-token and per-CU-hour inference lets one platform serve high-volume and experimental users without compromise.
  3. Take no margin on pass-through to stay credible. Zero-markup third-party billing positions OpenPipe as a neutral layer rather than a reseller.
  4. A free entry point plus no inference minimum lowers activation friction. Developers can reach first value before any spend commitment.
  5. Keep public rate cards where buyers expect them. Rates buried only in docs (with a 404 marketing page) cost discoverability — visibility is part of the pricing.

UBP implications

  1. Model-size as a value metric is a clean abstraction for AI training. Pricing the size of the artifact being produced — rather than the compute consumed — is a portable pattern other fine-tuning and model-customization vendors can adopt.
  2. Multi-meter inference reflects the maturity of usage-based AI billing. Splitting inference into per-token and per-time meters acknowledges that one usage unit rarely fits every traffic profile.
  3. Transparent zero-markup pass-through raises the bar for neutral platforms. As more tools broker third-party model access, refusing to mark up pass-through tokens becomes a competitive trust signal.

Sources

Bottom line

OpenPipe runs one of the cleaner usage-based rate cards in AI infrastructure: training priced by model size ($0.48–$2.90 per 1M tokens), inference split into per-token and per-CU-hour meters, and zero markup on third-party fine-tunes — all with no seats or minimums. The main gap is discoverability, with rates living only in docs behind a 404 marketing page, and discounts hidden behind enterprise sales.

Want to compare OpenPipe against other AI infrastructure pricing? Browse the pricing blueprint.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Current usage-based rate card

Training priced per 1M dataset tokens by model size ($0.48 for 8B-and-smaller up to $2.90 for 70B+). Hosted inference offers per-token rates (Llama 3.1 8B Instruct $0.30 in / $0.45 out; 70B Instruct $1.80 in / $2.00 out) or hourly Compute Units ($1.50–$12.00 per CU-hour). Third-party fine-tunes pass through at provider rates; enterprise is contact-sales. Marketing openpipe.ai/pricing returns 404 — rates live only in docs.

Current usage-based rate card - Training priced per 1M dataset tokens by model size ($0.48 for 8B-and-smaller up
captured

CoreWeave acquisition; rate card held

CoreWeave (NASDAQ: CRWV) announced a definitive agreement to acquire OpenPipe on September 3, 2025 (terms undisclosed), citing OpenPipe's open-source ART reinforcement-learning toolkit and agent-training tech. The October 2025 post-acquisition docs snapshot showed the same rate card (training $0.48–$2.90; inference unchanged), so list pricing and self-serve availability were not visibly disrupted. Sources: CoreWeave press release and TechCrunch, 2025-09-03; Wayback snapshot 2025-10-08 of docs.openpipe.ai/pricing/pricing.

~6x training price cut and new 14B tier

Training rates dropped sharply and a 14B tier was added: 8B-and-smaller fell from $3.00 to $0.48, 32B from $8.00 to $1.90, 70B+ from $16.00 to $2.90 per 1M tokens, with 14B set at $1.50. Inference per-token and Compute-Unit rates were unchanged. Confirmed across two independent snapshots (2025-01 at the old prices, 2025-04 at the new). Source: Wayback snapshots 2025-01-17 and 2025-04-20 of docs.openpipe.ai/pricing/pricing.

Pricing moves to docs; model-size training tiers at $3–$16

Pricing migrated to docs.openpipe.ai/pricing. Training was indexed to model size at $3.00 (8B and smaller), $8.00 (32B), and $16.00 (70B+) per 1M tokens — no 14B tier yet. Per-token inference (Llama 3.1 8B $0.30/$0.45; 70B $1.80/$2.00) and hourly Compute Units ($1.50–$12.00/CU-hour) were already in place. Source: Wayback snapshot 2025-01-17 of docs.openpipe.ai/pricing/pricing.

Per-token Developer plan with $100 free credits

Marketing pricing page (openpipe.ai/pricing) advertised a Developer per-token plan: $100 free credits, training from $4 per 1M tokens, and inference at $1.20 input / $1.60 output per 1M tokens, with limits of 50k training rows per dataset and up to 50 fine-tuned models. Business and Enterprise tiers were contact-sales. Source: Wayback snapshot 2024-08-04 of openpipe.ai/pricing.

Trivia
  • · OpenPipe cut its training rates roughly 6x in early 2025: the 8B-and-smaller tier fell from $3.00 to $0.48 per 1M tokens and the 70B+ tier from $16.00 to $2.90, captured between the January and April 2025 docs snapshots.
  • · CoreWeave agreed to acquire OpenPipe on September 3, 2025 (terms undisclosed) to fold its reinforcement-learning agent-training stack into CoreWeave's AI cloud — yet the published rate card was unchanged a month later in the October 2025 docs snapshot.
  • · OpenPipe's launch on Hacker News in September 2023 — 'Fine-tune your own Llama 2 to replace GPT-3.5/4' — drew 955 points and 181 comments, one of the higher-scoring fine-tuning threads of that year.

Questions & answers

How much does it cost to fine-tune a model on OpenPipe?
Training is billed per 1M dataset tokens, scaled by model size: $0.48 for 8B-and-smaller models, $1.50 for 14B, $1.90 for 32B, and $2.90 for 70B-and-larger models.
What does OpenPipe charge for hosted inference?
High-volume models use per-token pricing (e.g., Llama 3.1 8B Instruct at $0.30 input / $0.45 output per 1M tokens), while lower-volume models use hourly Compute Units priced $1.50–$12.00 per CU-hour.
Is there a free tier or minimum commitment?
You can start fine-tuning for free, and per-token inference has no minimum commitment — you only pay for the tokens you process, with automatic infrastructure scaling.
How are OpenAI or Gemini fine-tunes billed through OpenPipe?
Third-party fine-tunes pass through at the provider's standard rates with no OpenPipe markup; you are billed directly by OpenAI or Google.
Does OpenPipe offer enterprise or on-prem pricing?
Yes — enterprise plans are contact-sales (hello@openpipe.ai) and add volume discounts, on-premises or VPC deployment, dedicated support, custom SLAs, and reinforcement-learning engagements for production agents.