AI Summary
About
Zhipu AI — internationally branded Z.ai, and listed in Hong Kong as Beijing Zhipu Huazhang Technology (02513.HK) — is a Chinese foundation-model lab behind the GLM model family. It serves two buyers: developers who call GLM models over a per-million-token API (USD on the international z.ai platform, RMB on China-native open.bigmodel.cn), and coders who buy the flat GLM Coding Plan subscription to drive agentic coding inside tools like Claude Code, Cline, and Cursor. Enterprise and MaaS (model-as-a-service) deployments — fine-tuning, private/sovereign hosting, full dev kits — are sold through a sales motion.
Zhipu spun out of Tsinghua University’s Knowledge Engineering Group in 2019 and has become one of China’s “AI tigers” — the cohort of frontier Chinese labs that also includes Moonshot, MiniMax, Baichuan, and 01.AI. Its trajectory is unusually entangled with geopolitics: in January 2025 it became the first Chinese foundation-model lab added to the US Commerce Department’s Entity List, a move Zhipu publicly “strongly disagrees” with, stressing it depends on no US large-model technology. A year later, on 8 January 2026, Zhipu became the first foundation-model AI company to IPO anywhere in the world, debuting on the Hong Kong Stock Exchange and raising roughly US$560M at a ~US$6.7B valuation. Sovereignty is therefore not a marketing flourish here — it is a structural feature of the company.
The GLM catalog spans open-weight flagships (GLM-4.5, GLM-4.6, the lighter GLM-4.5-Air), free Flash tiers (GLM-4.5-Flash, GLM-4.7-Flash), vision models (GLM-4.5V), agent products (AutoGLM, GLM-PC), and a newer GLM-5 family positioned on agentic-coding benchmarks against Claude Opus. GLM-4.5 and GLM-4.6 are open-weighted under a permissive (MIT-style) license, so the weights are free to download — and, exactly as with Mistral, Zhipu monetizes hosted inference per token rather than the artifact itself. The headline strategic move is price: free Flash models, a per-token card that sits below Western frontier labs, and a coding subscription that openly undercuts Claude Code.
Pricing summary : a cheap per-token API plus a flat coding subscription
Zhipu runs a per-token GLM API with free Flash tiers, plus a separate flat GLM Coding Plan subscription. The dimensions are:
- GLM API tokens — separate input and output rates per million tokens, USD on z.ai: GLM-4.6 and GLM-4.5 at $0.60 in / $2.20 out, GLM-4.5-Air at $0.20 / $1.10, GLM-4.5V at $0.60 / $1.80, GLM-4-32B at $0.10 / $0.10. Cached input is steeply discounted ($0.03–$0.11).
- Free Flash tiers — GLM-4.5-Flash and GLM-4.7-Flash cost nothing to call, and new accounts get a 20-million-token grant. This descends directly from Zhipu making GLM-4-Flash free in 2024.
- GLM Coding Plan — a flat subscription billed quarterly: Lite
$30/quarter ($10/mo), Pro$90/quarter ($30/mo), Max$240/quarter ($80/mo), with prompt quotas per 5 hours and per week rather than a token meter. - RMB-native card — China developers buy the same models in yuan on open.bigmodel.cn (GLM-4.5 at 0.8 yuan in / 2 yuan out per 1M tokens; GLM-4-Flash free), with the Coding Plan from 20 yuan/mo.
- Enterprise / MaaS — fine-tuning, private/sovereign deployment, and the full dev kit are quoted by sales.
What makes this different: Zhipu exposes raw per-million-token billing publicly and gives its Flash models away free, then layers a flat coding subscription on top that is explicitly cheaper than Western coding agents — a price-war posture, not just a price list.
Pricing by product
GLM API — text & vision models (per million tokens, z.ai USD)
| Model | Input /M | Output /M | Cached input /M | Key mechanics |
|---|---|---|---|---|
| GLM-4.6 | $0.60 | $2.20 | $0.11 | Flagship; larger context, agentic/coding |
| GLM-4.5 | $0.60 | $2.20 | $0.11 | Open-weight flagship |
| GLM-4.5-Air | $0.20 | $1.10 | $0.03 | Lightweight, cost-sensitive default |
| GLM-4.5V | $0.60 | $1.80 | $0.11 | Vision / multimodal |
| GLM-4-32B-0414-128K | $0.10 | $0.10 | — | Flat in/out, 128K context |
| GLM-4.5-Flash | Free | Free | Free | Free tier; no usage cost |
| GLM-4.7-Flash | Free | Free | Free | Free flagship-Flash |
Cached-input pricing is “limited-time free” on several models; the table shows standing list rates. New accounts receive a 20-million-token grant.
GLM API — China-native card (per million tokens, RMB ¥)
| Model | Input /M | Output /M | Key mechanics |
|---|---|---|---|
| GLM-4.5 | 0.8 yuan | 2 yuan | China-native (open.bigmodel.cn) |
| GLM-4-FlashX | 0.1 yuan | 0.1 yuan | Ultra-cheap fast tier |
| GLM-4-Flash | Free | Free | Free to the public since 2024 |
RMB and USD cards cover overlapping models but differ by currency, promo, and geography. Figures are per 1M tokens.
GLM Coding Plan (flat subscription, USD)
| Tier | Price | Quota (approx.) | Key mechanics |
|---|---|---|---|
| Lite | ~80 prompts / 5h · ~400 / week · 100 web searches/mo | Solo devs, students; ~3x Claude Pro usage | |
| Pro | ~400 prompts / 5h · ~2,000 / week · 1,000 web searches/mo | Heavy daily coders | |
| Max | ~1,600 prompts / 5h · ~8,000 / week · 4,000 web searches/mo | High-frequency, complex projects |
All tiers include GLM-5.1, GLM-5-Turbo, GLM-4.7, and GLM-4.5-Air, and plug into Claude Code, Cline, Kilo Code, OpenCode, Cursor, and Roo Code. A launch promo of $3 (Lite) and $15 (Pro) first months was removed on 2026-02-11.
Sales motions across products: PLG / self-serve for the free Flash tiers, the pay-as-you-go API, and the GLM Coding Plan; sales-led for Enterprise / MaaS (private deployment, fine-tuning, sovereign hosting).
Hidden costs : What Zhipu AI users actually pay
Zhipu’s headline rates are low and public, but the real bill is shaped by three things the sticker doesn’t show: the output-token premium (output is ~3.7x input on GLM-4.6), the per-5-hour / weekly prompt quotas on the Coding Plan (not a token meter), and the currency you’re billed in (USD vs RMB). Two archetypes show how the total assembles.
Archetype 1 — a developer running a coding/RAG agent on the GLM API. Answering with GLM-4.6 at roughly 50M input + 15M output tokens/mo, with a chunk of that input served from cache.
| Line item | Monthly cost |
|---|---|
| GLM-4.6 input — ~40M fresh tok @ $0.60/M | $24.00 |
| GLM-4.6 cached input — ~10M tok @ $0.11/M | $1.10 |
| GLM-4.6 output — ~15M tok @ $2.20/M | $33.00 |
| Estimated total | ~$58/mo |
The lesson: on GLM-4.6 the $2.20/M output rate is ~3.7x the input rate, so output-heavy workloads (code generation, long answers) dominate — but caching input at $0.11/M (versus $0.60 fresh) shaves a meaningful slice off repeated context. Even so, the absolute total is low: this is a frontier-class flagship at a fraction of Western per-token rates.
Archetype 2 — a solo developer who’d otherwise pay for Claude Code. One GLM Coding Plan Lite seat instead of pay-as-you-go, using GLM-4.6 / GLM-5-Turbo inside Cline all day.
| Line item | Monthly cost |
|---|---|
| GLM Coding Plan — Lite (billed $30/quarter) | ~$10 |
| Overflow API tokens beyond quota (occasional) | ~$0–$5 |
| Estimated total | ~$10–$15/mo |
Here the surprise is how little it is: the Coding Plan converts a token meter into a flat quota, and Zhipu pitches Lite as roughly 3x Claude Pro usage at half the price. The catch is the quota shape — caps are stated as “~80 prompts per 5 hours” rather than tokens, so a heavy agentic session can hit the ceiling before a month’s worth of value is consumed, nudging users toward Pro or Max.
Want to estimate your own Zhipu AI bill? Use the Zhipu AI pricing calculator to model your costs based on token volume, cached input, and coding-seat count.
Pricing evolution : Zhipu AI pricing history and changes
Zhipu’s pricing has moved along two tracks. The API has billed per million tokens — and given away Flash tiers — since the 2024 price war; the coding side launched as a flat subscription in 2025 with an aggressive promo, then normalized in early 2026. Geopolitics (Entity List, Hong Kong IPO) runs underneath as a strategic, not a sticker, force. The dated milestones below are reconstructed from primary announcements and contemporaneous press.
Cadence
| Quarter | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2024 Q3 | 1 | 0 | 2024-08 GLM-4-Flash API made free to the public |
| 2024 Q4 | 0 | 1 | AutoGLM / GLM-PC agent products + GLM-OS concept |
| 2025 Q1 | 0 | 0 | 2025-01-15 Added to the US Entity List |
| 2025 Q3 | 1 | 1 | GLM-4.5 per-token API + GLM Coding Plan launch ($3 promo) |
| 2025 Q4 | 0 | 1 | GLM-4.6 ships at the same headline rate |
| 2026 Q1 | 1 | 0 | 2026-01-08 HK IPO (02513.HK); 2026-02-11 Coding Plan promo removed |
Tracked range: 2024 Q3–2026 Q2. Quarters not listed had no publicly announced price or SKU change. Dated milestones below cite primary/secondary sources.
Notable changes
- 2024-08 — GLM-4-Flash API made free to the public, opening Zhipu’s free-Flash posture in China’s domestic price war (AIbase).
- 2024-11 — AutoGLM and GLM-PC agent products launch under a “GLM-OS” vision; CogAgent-9B base model later open-sourced.
- 2025-01-15 — Zhipu added to the US Entity List, the first Chinese foundation-model lab listed; it states it “strongly disagrees” (reported by SCMP, Reuters).
- 2025-07 — GLM-4.5 launches with open weights at $0.60 in / $2.20 out per 1M tokens (Air at $0.20 / $1.10), setting the modern USD card.
- 2025-09 — GLM Coding Plan launches with a $3 (Lite) / $15 (Pro) first-purchase promo, wired into Claude Code, Cline, and others (Cline/Roo Code announcements).
- 2025-10 — GLM-4.6 ships with a larger context window at the same $0.60 / $2.20 rate — capability gains without a price increase.
- 2026-01-08 — Hong Kong IPO (02513.HK): first foundation-model lab to list globally, ~US$560M raised at ~US$6.7B valuation (reported by CNBC, Futu).
- 2026-02-11 — All Coding Plan first-purchase discounts removed; standing prices settle at ~$10 / $30 / $80 per month (Lite/Pro/Max).
The free-Flash and coding-undercut strategy in detail
Two pricing decisions define Zhipu. First, free Flash: making GLM-4-Flash free in 2024 and keeping GLM-4.5-Flash and GLM-4.7-Flash free in 2026 (plus a 20M-token signup grant) treats the bottom of the model ladder as a customer-acquisition channel, not a revenue line. Second, the coding undercut: the GLM Coding Plan converts the token meter into a flat quota and is pitched explicitly against Claude Code and Cursor — roughly 3x Claude Pro usage at half the price for Lite. Together they are a deliberate price-war posture: give the cheap models away, sell the agentic coding workflow flat, and let the per-token API sit comfortably below Western frontier rates. The Entity-List and sovereignty context sharpens this — for buyers wary of US-dependent stacks, low price and a non-US frontier model are a combined value proposition.
What’s unique : Zhipu AI’s distinctive pricing mechanics
1. Free flagship-Flash as an acquisition channel. Zhipu doesn’t just have a free trial — it makes entire capable models (GLM-4.5-Flash, GLM-4.7-Flash) permanently free to call, with a 20-million-token grant on signup. The free tier is the funnel: it seeds the open-weight ecosystem and converts users into paid GLM-4.6 / Coding Plan customers once they outgrow Flash. Few frontier labs give away a current-generation model with no usage cost.
2. A flat coding subscription that deliberately undercuts the West. The GLM Coding Plan converts per-token inference into a flat per-quarter quota (prompts per 5 hours, not tokens) and is priced — from ~$10/mo — explicitly against Claude Code and Cursor. It’s a packaging arbitrage: bundle agentic coding compute into a low flat fee and win share on price, accepting thinner per-unit margin for volume.
3. Dual-currency, dual-sovereignty price cards. The same models carry a USD card on international z.ai and an RMB card on China-native open.bigmodel.cn. This isn’t just localization — being on the US Entity List and listed in Hong Kong, Zhipu turns its non-US, sovereign foundation-model status into part of the value, especially for buyers who want frontier capability outside a US-controlled stack.
4. Capability gains without price hikes. GLM-4.6 shipped with more context and better coding than GLM-4.5 at the identical $0.60 / $2.20 rate, and the Entity-List year saw price cuts (free Flash, coding promos) rather than increases. Zhipu uses falling effective price as a competitive weapon, pushing more capability through the same or lower per-million-token rate.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| Frontier-class GLM-4.6 at $0.60 in / $2.20 out — well below Western flagship rates | Output-token premium (~3.7x input on GLM-4.6) can surprise output-heavy workloads |
| Permanently free Flash models + 20M-token signup grant lower the trial barrier to zero | Coding Plan quotas are stated in “prompts per 5 hours,” not tokens — harder to predict than a meter |
| Open weights (GLM-4.5 / 4.6) let buyers self-host, a credible lock-in hedge | Two currency cards (USD z.ai / RMB bigmodel.cn) with differing promos add comparison friction |
| GLM Coding Plan undercuts Claude Code / Cursor (~3x Claude Pro usage at half price) | US Entity List status complicates procurement for US-aligned enterprises |
| Cached input at $0.03–$0.11/M sharply discounts repeated context | Enterprise / MaaS (private deployment, fine-tuning) is fully sales-gated — no public floor |
| Capability rose (GLM-4.6) at the same rate, and promos cut effective price over time | Frequent model renames (GLM-4.5 → 4.6 → 5 → 5.1) make historical price tracking harder |
Billing UX : usage tracking and overage controls
- Per-token usage dashboard — the z.ai / bigmodel.cn console tracks API consumption by model and endpoint, with separate input, cached-input, and output lines so callers can see where spend accrues.
- Free Flash + token grant — GLM-4.5-Flash and GLM-4.7-Flash run with no usage cost, and the 20M-token signup grant lets new accounts validate workloads before paying.
- Coding Plan quotas — the GLM Coding Plan exposes prompt quotas per 5 hours and per week (plus monthly web-search/reader caps) rather than a token balance, so heavy users see throttling against a rolling window.
- Cached-input control — repeated context is billed at the cached rate ($0.03–$0.11/M) automatically, rewarding prompt reuse without a separate SKU.
- Dual-currency billing — international customers are billed in USD on z.ai; China-native customers in RMB on open.bigmodel.cn, each with its own promos and payment rails.
- Enterprise controls — MaaS / private deployments add fine-tuning, dedicated capacity, and sovereign hosting, quoted and managed through sales.
Strategic wins : Why Zhipu AI’s pricing decisions worked
1. Free Flash as a growth engine
Making GLM-4-Flash free in 2024 — and keeping current Flash models free in 2026 — turned the bottom of the model ladder into a customer-acquisition channel. Developers adopt the free model, build on it, and convert to GLM-4.6 or the Coding Plan when they outgrow it. Giving away a capable current-gen model is a freemium bet that distribution beats short-term inference revenue. See usage-based pricing strategy for why seeding adoption can dominate early monetization.
2. A flat coding subscription priced against the West
By converting per-token coding into a flat quarterly quota and pitching it at ~3x Claude Pro usage for half the price, Zhipu created a clean price story aimed straight at Claude Code and Cursor. The flat plan is legible to indie developers who fear token bill-shock, and the deliberate undercut anchors Zhipu as the value option — mirroring the shift away from rigid per-seat economics toward flexible, value-anchored coding pricing. Choosing the right usage metric — a flat prompt quota rather than raw tokens — is what makes the plan legible to buyers in the first place.
3. Turning sovereignty into a pricing dimension
The Entity-List listing and Hong Kong IPO made Zhipu’s non-US, sovereign status concrete. For buyers who want frontier capability outside a US-controlled stack, “cheap and sovereign” is a combined value metric — low price plus an open-weight, non-US foundation model. Zhipu monetizes that positioning the way Mistral packages EU sovereignty, as a structural reason to choose it over the incumbents.
Areas to improve : Gaps in Zhipu AI’s pricing approach
1. Translate Coding Plan quotas into tokens
The GLM Coding Plan gates usage as “~80 prompts per 5 hours” rather than a token allowance, so a buyer can’t map their actual workload to a tier without trial-and-error. A published token-equivalent — even approximate — would let developers self-select between Lite, Pro, and Max without hitting a wall mid-session. The opacity invites exactly the unpredictability a flat plan is supposed to remove.
2. Reconcile the USD and RMB cards
Running separate z.ai (USD) and bigmodel.cn (RMB) price cards with different promos forces international buyers to guess which is canonical and whether they’re getting the China-native rate. A single comparison view — or an explicit “international vs China” toggle with a stated FX basis — would cut the friction of cross-currency evaluation.
3. Expose an Enterprise / MaaS floor
Private deployment, fine-tuning, and sovereign hosting are fully sales-gated with no public anchor. A published starting price or a worked MaaS example would shorten the evaluation cycle for mid-market buyers who outgrow the Coding Plan but can’t justify a sales call. Compare how peers stage enterprise transparency.
Key takeaways
- Give the cheap models away, sell the workflow. Free Flash tiers plus a 20M-token grant turn the bottom of the ladder into a funnel; revenue lands on GLM-4.6 tokens and the flat Coding Plan. The free thing seeds adoption; the paid thing is capability and convenience.
- A flat quota can be a weapon against a token meter. Packaging agentic coding as a flat per-quarter quota — pitched at 3x Claude Pro usage for half the price — wins price-sensitive developers that a pure per-token rival can’t easily match.
- Push more capability through the same price. Shipping GLM-4.6 at GLM-4.5’s exact rate makes falling effective price the competitive story, without a visible discount line.
- Currency and geography are pricing levers. Dual USD/RMB cards let Zhipu meet buyers where they are — and turn non-US, sovereign status into part of the value for customers wary of US-controlled stacks.
- Open weights hedge the lock-in objection. Open-weighting GLM-4.5 / 4.6 lets enterprises self-host the same models they could buy as a service, so the per-token price competes on convenience rather than capture.
UBP implications
- Free flagship tiers reset the floor. When a frontier lab makes a current-gen model permanently free, the priced dimension shifts up the ladder to the better model and the managed workflow. UBP practitioners should expect “free at the bottom” to become a default acquisition pattern, not a loss leader exception.
- Flat quotas and token meters will coexist. Zhipu shows the same provider can sell raw tokens and a flat prompt-quota coding plan side by side. The lesson is to match the packaging to the buyer’s risk tolerance — meters for developers who want control, flat quotas for those who want predictability.
- Sovereignty and currency are emerging value metrics. As geopolitics fragments the AI stack, where a model is hosted and which currency you pay in become priceable dimensions — an early signal that non-functional attributes (jurisdiction, independence) will increasingly shape AI pricing alongside tokens.
Sources
- Z.ai model API pricing (USD) (accessed 2026-06-11)
- Z.ai developer docs — pricing overview (accessed 2026-06-11)
- GLM Coding Plan subscription (accessed 2026-06-11)
- open.bigmodel.cn pricing (China, RMB) (accessed 2026-06-11)
- AIbase — GLM-4-Flash API free to the public (2024-08) (accessed 2026-06-11)
- CNBC — China AI tiger Zhipu Hong Kong debut (2026-01) (accessed 2026-06-11)
- SCMP — US adds Zhipu to trade blacklist (2025-01) (accessed 2026-06-11)
- Browse the pricing blueprint corpus
Bottom line
Zhipu AI (Z.ai) prices a frontier GLM model family below the Western field: GLM-4.6 at $0.60 in / $2.20 out per million tokens, free Flash tiers with a 20M-token grant, and a flat GLM Coding Plan from ~$10/mo that openly undercuts Claude Code. Open weights hedge lock-in, dual USD/RMB cards meet buyers by geography, and capability has risen (GLM-4.6) without a price hike. The friction is quota opacity on the coding plan and sales-gated enterprise — but as the first foundation-model lab to IPO globally, and a US-Entity-Listed Tsinghua spin-out, Zhipu makes cheap-and-sovereign a single value proposition.
Want to compare Zhipu AI against other foundation-model providers? See Mistral AI and OpenAI, or browse the full pricing blueprint.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
Live snapshot: dual USD/RMB cards + free Flash tiers
Captured live: GLM-4.6/4.5 at $0.60 in / $2.20 out (cached $0.11), GLM-4.5-Air $0.20 / $1.10, GLM-4.5V $0.60 / $1.80, GLM-4-32B $0.10 / $0.10, GLM-4.5-Flash & GLM-4.7-Flash free, 20M-token signup grant; China-native bigmodel.cn shows RMB equivalents (GLM-4.5 0.8/2 yuan per 1M); GLM Coding Plan ~$10/$30/$80/mo.
Coding Plan first-purchase discounts removed
Zhipu removes all GLM Coding Plan first-purchase discounts (ending the $3/$15 promo). Standing prices settle at roughly Lite $30/quarter (~$10/mo), Pro $90/quarter (~$30/mo), Max $240/quarter (~$80/mo). (Source: vibecoding.app, z.ai, 2026-02.)
Hong Kong IPO — first foundation-model lab to list globally
Zhipu (02513.HK) debuts on the Hong Kong Stock Exchange, becoming the first foundation-model AI company to go public globally. It raised roughly HK$4.3B (about US$560M) at a ~US$6.7B valuation; the HK retail tranche was oversubscribed ~1,159x. (Source: CNBC, Futu, 2026-01.)
GLM-4.6 released at the same headline rate
GLM-4.6 ships with a larger context window and improved coding/agentic performance, priced at the same $0.60 in / $2.20 out per 1M tokens as GLM-4.5 — capability gains delivered without a price increase. (Source: z.ai docs, OpenRouter, 2025-10.)
GLM Coding Plan launches with a $3 promo
Zhipu launches the GLM Coding Plan — a flat coding subscription wired into Claude Code, Cline, Kilo Code and others — with an aggressive first-purchase promo of $3 (Lite) and $15 (Pro), explicitly positioned to undercut Claude Code and Cursor. (Source: z.ai, Cline/Roo Code announcements, 2025-09.)
GLM-4.5 launches with open weights and per-token API
Zhipu releases GLM-4.5 (and GLM-4.5-Air) with open weights under a permissive license, listed on z.ai at $0.60 in / $2.20 out per 1M tokens (Air at $0.20 / $1.10). Establishes the modern USD price card. (Source: z.ai, Zhipu blog, 2025.)
Added to the US Entity List
The US Commerce Department adds Zhipu (Beijing Zhipu Huazhang) to the BIS Entity List — the first Chinese foundation-model lab listed. Zhipu states it 'strongly disagrees.' The listing sharpens Zhipu's sovereign-AI, no-US-dependency positioning rather than its sticker prices. (Source: SCMP, Reuters, 2025-01.)
AutoGLM / GLM-PC agent products and GLM-OS concept launch
Zhipu introduces AutoGLM and GLM-PC under a 'GLM-OS' vision and later open-sources the CogAgent-9B base model, widening the product surface from raw API into agent products that later anchor the GLM Coding Plan. (Source: Zhipu newsroom, 2024-11/12.)
GLM-4-Flash API made free to the public
Zhipu opens its GLM-4-Flash API for free to all developers, an aggressive move in China's domestic model price war. This established the 'free Flash' anchor that persists in 2026 (GLM-4.5-Flash and GLM-4.7-Flash remain free). (Source: AIbase, 2024-08.)
- · Zhipu spun out of Tsinghua University's Knowledge Engineering Group in 2019 and is one of China's 'AI tigers' alongside Moonshot, MiniMax, Baichuan and 01.AI.
- · In January 2025 Zhipu became the first Chinese foundation-model lab added to the US Commerce Department's Entity List — it said it 'strongly disagrees' and stressed it relies on no US large-model technology.
- · On 8 January 2026 Zhipu (02513.HK) became the first foundation-model AI company to IPO anywhere in the world, raising about US$560M at a ~US$6.7B valuation on the Hong Kong exchange.
Questions & answers
- What is Zhipu AI's pricing model?
- Zhipu (Z.ai) runs a per-token GLM API plus a flat coding subscription. API rates are per million tokens — GLM-4.6 and GLM-4.5 at $0.60 input / $2.20 output, GLM-4.5-Air at $0.20 / $1.10 — while the GLM Coding Plan is a fixed monthly fee from about $10/mo.
- Does Zhipu AI offer a free tier?
- Yes. GLM-4.5-Flash and GLM-4.7-Flash are free to call with no usage cost, and new accounts receive a 20-million-token grant. Zhipu first made its GLM-4-Flash API free to the public in August 2024.
- How much does the GLM Coding Plan cost?
- The GLM Coding Plan is billed quarterly: Lite at about $30/quarter (~$10/mo), Pro at $90/quarter (~$30/mo), and Max at $240/quarter (~$80/mo). A launch promo of $3 and $15 first months was removed on 2026-02-11.
- How much does the Zhipu GLM API cost per million tokens?
- On z.ai (USD): GLM-4.6 and GLM-4.5 are $0.60 in / $2.20 out, GLM-4.5-Air is $0.20 / $1.10, GLM-4.5V is $0.60 / $1.80, and GLM-4-32B is $0.10 / $0.10. Cached input runs $0.03–$0.11. Flash models are free.
- What is the difference between z.ai and bigmodel.cn pricing?
- z.ai is Zhipu's international platform priced in USD; open.bigmodel.cn is the China-native platform priced in RMB (e.g. GLM-4.5 at 0.8 yuan input / 2 yuan output per 1M tokens). The model lineup overlaps but currency and some promos differ by geography.
- Can I self-host Zhipu's GLM models?
- Yes for the open-weight ones. Zhipu open-weighted GLM-4.5 and GLM-4.6 under permissive (MIT-style) licenses, so you can download and run them yourself, while Zhipu monetizes hosted inference per token and the managed coding subscription.