Token prices fall as access tiers rise
AI API token prices fall with almost every model generation — frontier rates dropped roughly 10× from the GPT-4 era to the GPT-5 era — while the price of human-facing access climbs toward a $200/mo ceiling. Compute deflates; access inflates.
What's happening — and why
What's happening: the cost of calling an AI model through its API keeps dropping. Each new model generation tends to launch cheaper than the one it replaces, and vendors add further discounts — prompt caching, batch processing — in between releases.
Why: raw inference is becoming a commodity. Open-weight models set aggressive price floors, serving efficiency improves, and competition is intense, so vendors pass the savings to developers to win volume. They keep their pricing power for human-facing subscriptions instead, where willingness-to-pay is far higher than for a raw token.
How it works
Evidence over time
6 supporting · 2 counter — hover or tap a point for detail, click to jump to the row.
Evidence
| Company | Date | What happened |
|---|---|---|
| OpenAI | Nov 2023 | GPT-4 Turbo launched ~10× cheaper than the original GPT-4. |
| OpenAI | May 2024 | GPT-4o launched ~50% cheaper than GPT-4 Turbo; 4o-mini (2024-07) pushed frontier-ish quality to commodity price. |
| Nov 2024 | Gemini 1.5 token pricing cut ~50–65%. | |
| DeepSeek | May 2024 | V2 at $0.14/1M input — a market-shock price that reset expectations. |
| DeepSeek | Dec 2024 | V3 at $0.27/1M — frontier-class performance at commodity rates. |
| Anthropic | Aug 2024 | Prompt caching cut input cost by up to 80% without a model change. |
Counterexamples
- You.com · Sep 2025 — Top consumer tier repriced ~6.7× — $30 Team replaced by $200 Max.
- Character.ai · Feb 2026 — Introduced mid-chat ads for free users — monetisation rising, not falling.
For buyers
Re-baseline token-cost assumptions at least twice a year — a model you priced six months ago is usually cheaper now, and a newer model may be cheaper and better. Don't extrapolate the deflation to seats: model the workload as raw metered tokens (falling) separately from per-seat access (rising), and pick the cheaper envelope deliberately.
For vendors
To ride deflation you need per-model rate cards that can change without breaking contracts, plus structural discounts (caching, batch) to cut effective price without touching headline rates. To monetise access, a premium seat tier with priority or uncapped usage captures the value that falling tokens leave on the table.
Outlook — what to watch
Expect another leg down: open-weight models (DeepSeek, Llama-class) keep resetting the floor, and caching/batch discounts are spreading. The deflation only stalls if frontier capability stops commoditising — watch whether any vendor can hold a premium token price on a genuinely differentiated model. Access tiers still have room to climb.
Bottom line
Compute is deflating relentlessly while access inflates. Every frontier vendor in the corpus has cut per-token prices at least once; the value is migrating from the token to the seat.
FAQ
Are AI API prices going up or down?
Down. Every frontier vendor in the corpus has cut per-token API prices at least once — usually at each model release — and layered caching and batch discounts on top. Consumer subscriptions are the exception; those are rising.
Why are ChatGPT and Claude subscriptions getting more expensive if tokens are cheaper?
The token (raw compute) and the seat (human access) sit on different curves. Vendors pass compute deflation to API developers but capture value from power users through premium ~$200/mo access tiers.
How often should I re-check my model costs?
At least twice a year. A model you priced six months ago is usually cheaper now, and a newer one is often cheaper and better — Claude 3.5 Sonnet beat Opus at a fraction of the price.