AI Summary
About
OpenPipe is a fine-tuning and hosted-inference platform that helps engineering teams replace expensive general-purpose LLM prompts with small, specialized models trained on their own request data. Its supervised fine-tuning (SFT) workflow captures every production request/response, turns that traffic into a training dataset in minutes, and lets teams launch a state-of-the-art fine-tune in two clicks. The pitch is concrete: production teams see error rates drop sharply after swapping prompts for fine-tuned models, and fine-tuned Llama 3.1 models are positioned as roughly 8x cheaper than GPT-4o at comparable quality.
The company serves developers and AI product teams who want to own their model weights and deploy anywhere — cloud, edge, or on-prem — without the per-call cost of frontier APIs. Beyond SFT, OpenPipe has moved up-market into reinforcement learning for enterprise agents, offering consulting-style engagements that translate corporate KPIs into reward schemas and govern autonomous agents in production. Security and compliance (SOC 2 Type 2, HIPAA, GDPR) are first-class, reflecting an enterprise buyer who needs to fine-tune on sensitive data.
Founded in 2022 and Y Combinator-backed, the Seattle-area startup raised a $6.7M seed round in March 2024 (Costanoa Ventures, Y Combinator, and angels including GitHub co-founder Tom Preston-Werner). It is best known in the open-source community for ART (Agent Reinforcement Trainer), a GRPO-based RL toolkit with roughly 9k GitHub stars. On September 3, 2025, CoreWeave (NASDAQ: CRWV) announced a definitive agreement to acquire OpenPipe (terms undisclosed) to fold its agent-training and RL stack into CoreWeave’s AI cloud — a move that, as of the latest captures, left OpenPipe’s self-serve rate card intact.
OpenPipe sits in a competitive band alongside other model-customization and inference platforms, differentiating on the “log your traffic, click train” simplicity of its data-capture loop and on transparent, fully usage-based pricing rather than seat licenses or platform subscriptions.
Pricing summary : How OpenPipe’s per-token fine-tuning and inference rates work
OpenPipe runs a pure usage-based model with no seats and no minimum commitment. You pay across three independent meters: per-million-token training fees that scale with model size, per-token hosted inference for high-volume models, and per-CU-hour Compute Units for lower-volume models — plus zero-markup pass-through for third-party fine-tunes.
The model has these dimensions:
- Training — billed per 1M dataset tokens, scaled by the size of the model being fine-tuned: $0.48 (8B and smaller), $1.50 (14B), $1.90 (32B), $2.90 (70B+).
- Hosted inference, per-token — for popular high-volume models, billed per 1M input and output tokens with no minimum (e.g., Llama 3.1 8B Instruct at $0.30 in / $0.45 out).
- Hosted inference, hourly Compute Units — for experimental/low-volume models, billed per CU-hour to the second, $1.50–$12.00 depending on model size.
- Third-party models — OpenAI GPT and Google Gemini fine-tunes are passed through at the provider’s standard rates with no OpenPipe markup; you are billed directly by the provider.
- Enterprise — contact-sales for volume discounts, on-prem/VPC deployment, custom SLAs, and the RL-for-agents engagement.
What makes this different: most fine-tuning platforms bill raw GPU time, but OpenPipe abstracts training cost into a single per-token rate indexed to model size, so a customer can estimate a fine-tune’s cost from dataset size alone — a usage-based approach well-suited to AI products. See how this compares across the corpus on token-based billing and pure-usage pricing.
Pricing by product
Fine-tuning (training)
| Tier | Price | Included | Key mechanics |
|---|---|---|---|
| 8B and smaller | $0.48 / 1M tokens | Training on datasets for models 8B parameters and under | Cheapest tier; cost indexed to dataset token count |
| 14B models | $1.50 / 1M tokens | Training for 14B-parameter models | ~3x the 8B rate |
| 32B models | $1.90 / 1M tokens | Training for 32B-parameter models | Mid-size band |
| 70B+ models | $2.90 / 1M tokens | Training for 70B-and-larger models | Top training rate; ~6x the 8B rate |
Hosted inference — per-token (high-volume models)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Key mechanics |
|---|---|---|---|
| Llama 3.1 8B Instruct | $0.30 | $0.45 | No minimum commitment; automatic scaling |
| Llama 3.1 70B Instruct | $1.80 | $2.00 | Pay only for tokens processed |
Hosted inference — hourly Compute Units (experimental / low-volume models)
| Model | Rate per CU-hour | Key mechanics |
|---|---|---|
| Llama 3.1 8B | $1.50 | 1 CU handles up to 24 req/sec; billed to the second |
| Mistral Nemo 12B | $1.50 | Auto-scales when traffic exceeds capacity |
| Qwen 2.5 32B Coder | $6.00 | CU stays active 60s after a traffic spike |
| Qwen 2.5 14B | $6.00 | Designed for lower-volume / experimental workloads |
| Qwen 2.5 72B | $12.00 | Largest-model band |
| Llama 3.1 70B | $12.00 | Largest-model band |
Third-party models (OpenAI, Gemini, etc.)
| Tier | Price | Included | Key mechanics |
|---|---|---|---|
| Third-party fine-tunes | Provider’s standard rate | OpenAI GPT, Google Gemini models via OpenPipe | No OpenPipe markup; billed by provider |
Enterprise
| Tier | Price | Included | Key mechanics |
|---|---|---|---|
| Enterprise | Custom | Volume discounts, on-prem deployment, dedicated support, custom SLAs, advanced security, more storage | Sales-led, quoted |
Sales motions across products: PLG / self-serve for training, per-token and Compute-Unit inference; sales-led for the enterprise / RL-for-agents engagement.
Hidden costs : what a fine-tune-plus-serving workload actually bills
The per-token headline rates are easy to underestimate because a real deployment pays on two or three meters at once — a training run, then ongoing inference. Consider a team fine-tuning a Llama 3.1 8B model on a 50M-token dataset and then serving it at high volume.
Fine-tune-and-serve archetype (Llama 3.1 8B)
| Line item | Monthly cost |
|---|---|
| Training: 50M dataset tokens × $0.48 / 1M (8B rate) | $24 |
| Inference input: 200M tokens × $0.30 / 1M | $60 |
| Inference output: 100M tokens × $0.45 / 1M | $45 |
| Total (first month, training + serving) | $129 |
The lesson: training is a near-trivial one-time line item relative to ongoing inference, so the meter that dominates the bill is per-token serving — exactly where OpenPipe pitches an ~8x cost advantage over GPT-4o. Teams choosing the hourly Compute Unit path instead should model peak concurrency, since a single CU caps at 24 requests/second before auto-scaling adds more billable units. For background on building these estimates, see our guides on usage-based pricing fundamentals and tracking and metering usage events.
Want to estimate your own OpenPipe bill? Use the OpenPipe pricing calculator to model your monthly cost based on training tokens, per-token inference volume, and Compute-Unit concurrency.
Pricing evolution : a 6x training price cut, then the CoreWeave acquisition
Cadence
| Quarter | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2024 Q3 | 0 | 0 | Marketing page advertised a per-token Developer plan: $100 free credits, training from $4 / 1M tokens, inference $1.20 in / $1.60 out. |
| 2025 Q1 | 0 | 1 | Pricing migrated to docs with model-size training tiers at $3.00 / $8.00 / $16.00 per 1M tokens; per-token + Compute-Unit inference in place. |
| 2025 Q2 | 1 | 1 | ~6x training cut (8B $3.00→$0.48; 70B+ $16.00→$2.90) and a new 14B tier ($1.50) appear by the 2025-04-20 snapshot. |
| 2025 Q3 | 0 | 0 | CoreWeave announced acquisition of OpenPipe on 2025-09-03 (terms undisclosed); rate card unchanged in the post-acquisition Oct 2025 snapshot. |
Tracked range: 2024-08 to 2026-06. Quarters not listed were verified stable (0 price changes, 0 SKU additions). The marketing page openpipe.ai/pricing collapsed to a near-empty 404 by the 2025-07-17 snapshot, with pricing thereafter living only in docs.
Notable changes
- 2024-08 — Marketing page offered $100 free credits, training from $4 / 1M tokens, inference $1.20 in / $1.60 out (Wayback 2024-08-04 of openpipe.ai/pricing).
- 2025-01 — Pricing moved to docs with model-size training at $3.00 (8B) / $8.00 (32B) / $16.00 (70B+) per 1M tokens (Wayback 2025-01-17).
- 2025-04 — Training rates cut ~6x to $0.48 / $1.90 / $2.90 and a 14B tier ($1.50) added; inference unchanged (Wayback 2025-04-20, corroborated against the 2025-01 snapshot).
- 2025-09-03 — CoreWeave announced a definitive agreement to acquire OpenPipe, terms undisclosed (TechCrunch).
- 2025-10 — Post-acquisition docs snapshot showed the same rate card, confirming list pricing and self-serve access were not disrupted (Wayback 2025-10-08).
The CoreWeave acquisition in detail
On September 3, 2025, CoreWeave (NASDAQ: CRWV) announced it had agreed to acquire OpenPipe for undisclosed terms. CoreWeave framed the deal around OpenPipe’s open-source ART (Agent Reinforcement Trainer) toolkit and its reinforcement-learning approach to building production agents, positioning it as a vertical-integration play on top of CoreWeave’s earlier Weights & Biases acquisition. For buyers, the key pricing question was continuity — and the evidence says continuity held: the docs rate card captured on 2025-10-08, roughly a month after the announcement, was identical to the pre-announcement card ($0.48–$2.90 training, unchanged per-token and Compute-Unit inference). The most material structural change in OpenPipe’s pricing history therefore predates the acquisition: the ~6x training cut in early 2025, not the change of ownership.
What’s unique : model-size-indexed training and zero-markup pass-through
1. Training priced by model size, not GPU time. Instead of exposing raw GPU-hour billing, OpenPipe collapses training cost into a single per-1M-token rate indexed to the model’s parameter count ($0.48 to $2.90). A customer can forecast a fine-tune’s cost from dataset size alone — a notably more legible meter than the GPU-time billing common to other model-inference platforms.
2. Two inference meters for two workload shapes. High-volume models get pay-as-you-go per-token pricing with no minimum; experimental and low-volume models get serverless-style hourly Compute Units billed to the second. This lets buyers pick the meter that matches their traffic profile rather than forcing all workloads into one billing model.
3. Zero-markup third-party pass-through. OpenAI GPT and Gemini fine-tunes route through OpenPipe but bill directly from the provider at standard rates, so OpenPipe takes no margin on third-party tokens — a transparency choice that keeps it credible as a neutral fine-tuning layer.
4. Training prices that fell, not rose. Unlike many AI platforms that creep prices up, OpenPipe cut training rates roughly 6x in early 2025 (8B from $3.00 to $0.48 per 1M tokens; 70B+ from $16.00 to $2.90) and added a finer 14B tier — passing improving GPU economics straight to self-serve users rather than capturing the margin. That deflation, captured across the 2025-01 and 2025-04 docs snapshots, is the defining event in its pricing history, and it survived the CoreWeave acquisition intact.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| Fully transparent, public usage rates with no seats or minimums | Marketing pricing page (openpipe.ai/pricing) returns 404 — rates live only in docs |
| Training cost forecastable from dataset size (model-size-indexed) | No published volume tiers or committed-use discounts (enterprise only) |
| Training got ~6x cheaper in early 2025, passing GPU economics to users | Limited per-token catalog (only Llama 3.1 8B/70B published) |
| Zero markup on third-party (OpenAI/Gemini) fine-tunes | Enterprise / RL-for-agents pricing entirely opaque (contact-sales) |
| Strong open-source pull (ART toolkit ~9k stars; 955-point HN launch) | CoreWeave acquisition (Sept 2025) adds long-term roadmap uncertainty |
Billing UX : metered controls for tokens, compute units, and pass-through
OpenPipe’s billing surface is built around granular, self-serve usage meters with no seat management:
- Per-1M-token training meter — training cost is computed from the dataset token count multiplied by a model-size rate ($0.48–$2.90), so spend is predictable before a run starts.
- Per-token inference meter — high-volume models bill separately on input and output tokens (e.g., $0.30 in / $0.45 out for Llama 3.1 8B Instruct) with no minimum commitment and automatic infrastructure scaling.
- Compute Unit (CU) metering — each CU absorbs up to 24 simultaneous requests per second, is billed to the second, auto-scales when traffic exceeds capacity, and remains active for 60 seconds after a traffic spike.
- Third-party pass-through billing — for OpenAI/Gemini fine-tunes, OpenPipe forwards API calls and the provider bills you directly at standard rates, with no OpenPipe line item or markup.
- Free entry + enterprise contact channel — teams can start fine-tuning for free, while enterprise pricing, volume discounts, and on-prem/VPC deployment are negotiated via hello@openpipe.ai.
Strategic wins : pricing choices that lower buyer friction
1. Model-size-indexed training makes cost predictable
By indexing training to model size rather than GPU time, OpenPipe gives buyers a number they can compute before committing — dataset tokens times a known rate. This removes the biggest source of fine-tuning sticker shock and aligns with the transparency principle that lowers buyer friction.
2. Dual inference meters capture both ends of the volume curve
Offering per-token billing for popular models and hourly Compute Units for experimental ones lets OpenPipe monetize a wider workload range without forcing a single pricing model. It mirrors the usage-based pricing models that successful infra platforms adopt.
3. Zero-markup pass-through builds trust as a neutral layer
Refusing margin on OpenAI and Gemini fine-tunes positions OpenPipe as a fine-tuning layer rather than a token reseller, which matters to buyers wary of opaque markups. This aligns price with the value metric that matters — OpenPipe charges for the fine-tuning it actually performs, not for brokering someone else’s tokens.
Areas to improve : gaps in transparency and self-serve depth
1. Fix the dead marketing pricing page
The public-facing openpipe.ai/pricing URL returns a 404, leaving rates discoverable only inside the docs. Restoring a marketing pricing page that mirrors the docs rate card would improve SEO discoverability and reduce the chance prospects assume pricing is gated — a pattern worth comparing against peers in the pricing blueprint.
2. Publish volume tiers and committed-use discounts
Today any discount lives behind an enterprise sales conversation. Publishing transparent volume breaks on training and per-token inference — as covered in our guide to thresholding and alerting — would let larger self-serve teams scale without a sales call.
3. Broaden the per-token inference catalog
Only Llama 3.1 8B and 70B have published per-token rates; every other model defaults to the coarser hourly Compute Unit meter. Expanding the per-token catalog to popular Qwen and Mistral models would give cost-sensitive, high-volume buyers a pay-as-you-go option instead of pushing them onto per-hour billing.
Key takeaways
- Index your most-feared meter to something buyers can pre-compute. OpenPipe converts GPU-time anxiety into a dataset-size multiplication, making training cost knowable before commitment.
- Match the meter to the workload shape. Offering both per-token and per-CU-hour inference lets one platform serve high-volume and experimental users without compromise.
- Take no margin on pass-through to stay credible. Zero-markup third-party billing positions OpenPipe as a neutral layer rather than a reseller.
- A free entry point plus no inference minimum lowers activation friction. Developers can reach first value before any spend commitment.
- Keep public rate cards where buyers expect them. Rates buried only in docs (with a 404 marketing page) cost discoverability — visibility is part of the pricing.
UBP implications
- Model-size as a value metric is a clean abstraction for AI training. Pricing the size of the artifact being produced — rather than the compute consumed — is a portable pattern other fine-tuning and model-customization vendors can adopt.
- Multi-meter inference reflects the maturity of usage-based AI billing. Splitting inference into per-token and per-time meters acknowledges that one usage unit rarely fits every traffic profile.
- Transparent zero-markup pass-through raises the bar for neutral platforms. As more tools broker third-party model access, refusing to mark up pass-through tokens becomes a competitive trust signal.
Sources
- OpenPipe pricing overview (docs) (accessed 2026-06-04)
- OpenPipe Enterprise RL Solutions (accessed 2026-06-04)
- OpenPipe Fine-Tuning product page (accessed 2026-06-04)
- Browse the full pricing blueprint corpus for peers in usage-based AI infrastructure.
Bottom line
OpenPipe runs one of the cleaner usage-based rate cards in AI infrastructure: training priced by model size ($0.48–$2.90 per 1M tokens), inference split into per-token and per-CU-hour meters, and zero markup on third-party fine-tunes — all with no seats or minimums. The main gap is discoverability, with rates living only in docs behind a 404 marketing page, and discounts hidden behind enterprise sales.
Want to compare OpenPipe against other AI infrastructure pricing? Browse the pricing blueprint.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
Current usage-based rate card
Training priced per 1M dataset tokens by model size ($0.48 for 8B-and-smaller up to $2.90 for 70B+). Hosted inference offers per-token rates (Llama 3.1 8B Instruct $0.30 in / $0.45 out; 70B Instruct $1.80 in / $2.00 out) or hourly Compute Units ($1.50–$12.00 per CU-hour). Third-party fine-tunes pass through at provider rates; enterprise is contact-sales. Marketing openpipe.ai/pricing returns 404 — rates live only in docs.
CoreWeave acquisition; rate card held
CoreWeave (NASDAQ: CRWV) announced a definitive agreement to acquire OpenPipe on September 3, 2025 (terms undisclosed), citing OpenPipe's open-source ART reinforcement-learning toolkit and agent-training tech. The October 2025 post-acquisition docs snapshot showed the same rate card (training $0.48–$2.90; inference unchanged), so list pricing and self-serve availability were not visibly disrupted. Sources: CoreWeave press release and TechCrunch, 2025-09-03; Wayback snapshot 2025-10-08 of docs.openpipe.ai/pricing/pricing.
~6x training price cut and new 14B tier
Training rates dropped sharply and a 14B tier was added: 8B-and-smaller fell from $3.00 to $0.48, 32B from $8.00 to $1.90, 70B+ from $16.00 to $2.90 per 1M tokens, with 14B set at $1.50. Inference per-token and Compute-Unit rates were unchanged. Confirmed across two independent snapshots (2025-01 at the old prices, 2025-04 at the new). Source: Wayback snapshots 2025-01-17 and 2025-04-20 of docs.openpipe.ai/pricing/pricing.
Pricing moves to docs; model-size training tiers at $3–$16
Pricing migrated to docs.openpipe.ai/pricing. Training was indexed to model size at $3.00 (8B and smaller), $8.00 (32B), and $16.00 (70B+) per 1M tokens — no 14B tier yet. Per-token inference (Llama 3.1 8B $0.30/$0.45; 70B $1.80/$2.00) and hourly Compute Units ($1.50–$12.00/CU-hour) were already in place. Source: Wayback snapshot 2025-01-17 of docs.openpipe.ai/pricing/pricing.
Per-token Developer plan with $100 free credits
Marketing pricing page (openpipe.ai/pricing) advertised a Developer per-token plan: $100 free credits, training from $4 per 1M tokens, and inference at $1.20 input / $1.60 output per 1M tokens, with limits of 50k training rows per dataset and up to 50 fine-tuned models. Business and Enterprise tiers were contact-sales. Source: Wayback snapshot 2024-08-04 of openpipe.ai/pricing.
- · OpenPipe cut its training rates roughly 6x in early 2025: the 8B-and-smaller tier fell from $3.00 to $0.48 per 1M tokens and the 70B+ tier from $16.00 to $2.90, captured between the January and April 2025 docs snapshots.
- · CoreWeave agreed to acquire OpenPipe on September 3, 2025 (terms undisclosed) to fold its reinforcement-learning agent-training stack into CoreWeave's AI cloud — yet the published rate card was unchanged a month later in the October 2025 docs snapshot.
- · OpenPipe's launch on Hacker News in September 2023 — 'Fine-tune your own Llama 2 to replace GPT-3.5/4' — drew 955 points and 181 comments, one of the higher-scoring fine-tuning threads of that year.
Questions & answers
- How much does it cost to fine-tune a model on OpenPipe?
- Training is billed per 1M dataset tokens, scaled by model size: $0.48 for 8B-and-smaller models, $1.50 for 14B, $1.90 for 32B, and $2.90 for 70B-and-larger models.
- What does OpenPipe charge for hosted inference?
- High-volume models use per-token pricing (e.g., Llama 3.1 8B Instruct at $0.30 input / $0.45 output per 1M tokens), while lower-volume models use hourly Compute Units priced $1.50–$12.00 per CU-hour.
- Is there a free tier or minimum commitment?
- You can start fine-tuning for free, and per-token inference has no minimum commitment — you only pay for the tokens you process, with automatic infrastructure scaling.
- How are OpenAI or Gemini fine-tunes billed through OpenPipe?
- Third-party fine-tunes pass through at the provider's standard rates with no OpenPipe markup; you are billed directly by OpenAI or Google.
- Does OpenPipe offer enterprise or on-prem pricing?
- Yes — enterprise plans are contact-sales (hello@openpipe.ai) and add volume discounts, on-premises or VPC deployment, dedicated support, custom SLAs, and reinforcement-learning engagements for production agents.