What is Zhipu AI's pricing model?

Zhipu (Z.ai) runs a per-token GLM API plus a flat coding subscription. API rates are per million tokens — GLM-5.2 at $1.4 input / $4.4 output, GLM-4.7 / GLM-4.6 / GLM-4.5 at $0.6 / $2.2, GLM-4.5-Air at $0.2 / $1.1 — while the GLM Coding Plan is a fixed monthly fee listed at $18 to $160.

Does Zhipu AI offer a free tier?

Yes. GLM-4.7-Flash and GLM-4.5-Flash (text) and GLM-4.6V-Flash (vision) are free to call with no usage cost across input, cached input and output. Zhipu first made its GLM-4-Flash API free to the public in August 2024.

How much does the GLM Coding Plan cost?

List prices are $18/month (Lite), $72/month (Pro) and $160/month (Max). Billing term sets the discount: monthly −10% ($16.2 / $64.8 / $144), quarterly −20% ($14.4 / $57.6 / $128) and yearly −30% ($12.6 / $50.4 / $112 per month).

How much does the Zhipu GLM API cost per million tokens?

On z.ai (USD): GLM-5.2 and GLM-5.1 are $1.4 in / $4.4 out, GLM-5 is $1 / $3.2, GLM-5-Turbo is $1.2 / $4.0, GLM-4.7 / GLM-4.6 / GLM-4.5 are $0.6 / $2.2, GLM-4.5-Air is $0.2 / $1.1, and GLM-4-32B-0414-128K is $0.1 / $0.1. Cached input runs $0.01–$0.45.

Does Zhipu charge for anything besides tokens?

Yes. Web Search is $0.01 per use, images are $0.015 (GLM-Image) or $0.01 (CogView-4) each, video generation is $0.2–$0.4 per video, GLM-ASR-2512 speech is $0.03 per 1M tokens, and agents run $0.7 per 1M tokens (Slide/Poster) to $3 per 1M tokens (translation).

Can I self-host Zhipu's GLM models?

Yes for the open-weight ones. Zhipu open-weighted GLM-4.5 and GLM-4.6 under permissive (MIT-style) licenses, so you can download and run them yourself, while Zhipu monetizes hosted inference per token and the managed coding subscription.

Zhipu AI Pricing

AI Summary

Zhipu AI (Z.ai) bills its GLM models per million tokens: GLM-5.2 and GLM-5.1 at $1.4 input / $4.4 output, GLM-5 at $1 / $3.2, GLM-4.7 / GLM-4.6 / GLM-4.5 at $0.6 / $2.2, and GLM-4.5-Air at $0.2 / $1.1.
GLM-4.7-Flash, GLM-4.5-Flash and GLM-4.6V-Flash are free to call, cached input runs as low as $0.01 per 1M tokens, and cached-input storage is listed 'Limited-time Free'.
The GLM Coding Plan lists at $18/month (Lite), $72/month (Pro) and $160/month (Max), discounted 10% on monthly, 20% on quarterly and 30% on yearly billing — $12.6 / $50.4 / $112 per month at the annual rate.
Beyond tokens, Z.ai meters web search at $0.01 per use, image generation from $0.01 per image, video generation from $0.2 per video, speech recognition at $0.03 per 1M tokens, and agents at $0.7–$3 per 1M tokens.
Zhipu is a Tsinghua spin-out placed on the US Entity List in January 2025 and the first foundation-model lab to IPO globally (Hong Kong, 02513.HK, January 2026), and it raised a further ~$4B in a July 2026 Hong Kong share sale — making sovereignty and a deep war chest part of the story.

Pricing summary

Zhipu AI (Z.ai) 2026 — free Flash models, a cheap per-token GLM API, and a coding subscription that undercuts the West

A pure per-million-token API (with free Flash tiers) sits beside a flat GLM Coding Plan, priced in USD on z.ai and RMB on China-native bigmodel.cn.

Free Flash

Free

Developers on the Flash model line

Pay-as-you-go

GLM API

from $0.07 /M in

Developers calling GLM models per token

GLM Coding Plan

$18 /mo

Agentic coding in Claude Code, OpenClaw, Cline

Enterprise / MaaS

Enterprises wanting private GLM deployments

Media, audio & agent SKUs

from $0.01 /image

Teams generating images, video, speech and slides

Coding Plan billing terms

$12.6 /mo

Yearly billing is the cheapest way in

All figures in USD from z.ai and the Z.AI developer docs. Flash models (GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash) are free. The China-native open.bigmodel.cn RMB card is geo-restricted and did not render this cycle. Full per-model tables below.

About

Zhipu AI — internationally branded Z.ai, and listed in Hong Kong as Beijing Zhipu Huazhang Technology (02513.HK) — is a Chinese foundation-model lab behind the GLM model family. It serves two buyers: developers who call GLM models over a per-million-token API (USD on the international z.ai platform, RMB on China-native open.bigmodel.cn), and coders who buy the flat GLM Coding Plan subscription to drive agentic coding inside tools like Claude Code, Cline, and Cursor. Enterprise and MaaS (model-as-a-service) deployments — fine-tuning, private/sovereign hosting, full dev kits — are sold through a sales motion.

Zhipu spun out of Tsinghua University’s Knowledge Engineering Group in 2019 and has become one of China’s “AI tigers” — the cohort of frontier Chinese labs that also includes Moonshot, MiniMax, Baichuan, and 01.AI. Its trajectory is unusually entangled with geopolitics: in January 2025 it became the first Chinese foundation-model lab added to the US Commerce Department’s Entity List, a move Zhipu publicly “strongly disagrees” with, stressing it depends on no US large-model technology. A year later, on 8 January 2026, Zhipu became the first foundation-model AI company to IPO anywhere in the world, debuting on the Hong Kong Stock Exchange and raising roughly US$560M at a ~US$6.7B valuation. It followed that debut with a ~US$4B Hong Kong share sale in July 2026 — months after listing and after a reported ~1,500% run-up in the stock — a war chest that underwrites its aggressive domestic pricing, though no rate change was tied to the raise. Sovereignty is therefore not a marketing flourish here — it is a structural feature of the company.

The GLM catalog now spans a GLM-5 flagship line (GLM-5.2, billed as the latest open-source flagship with 1M-token context; GLM-5.1, which Z.ai says matches Claude Opus 4.6; GLM-5; and the OpenClaw-tuned GLM-5-Turbo), the older GLM-4.x line (GLM-4.7, GLM-4.6, GLM-4.5, GLM-4.5-Air), free Flash tiers (GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash), a vision line (GLM-5V-Turbo, GLM-4.6V, GLM-OCR, GLM-4.5V), image and video generation (GLM-Image, CogView-4, CogVideoX-3, Vidu Q1/2), speech recognition (GLM-ASR-2512), and packaged agents (Translation, Slide/Poster, Video Effect Template). GLM-4.5 and GLM-4.6 are open-weighted under a permissive (MIT-style) license, so the weights are free to download — and, exactly as with Mistral, Zhipu monetizes hosted inference per token rather than the artifact itself.

The strategic posture is still price, but the shape has changed. Cheap and free models remain at the bottom of the ladder, while the GLM-5 line carries a premium per-token rate and the GLM Coding Plan — once a sub-$10-a-month curiosity — now lists at $18 to $160 a month with a billing-term discount ladder. Zhipu has moved from “undercut everything” to “free at the bottom, market rate at the top.”

Pricing summary : a cheap per-token API plus a flat coding subscription

Zhipu runs a per-token GLM API with free Flash tiers, plus a separate flat GLM Coding Plan subscription and a set of per-artifact media SKUs. The dimensions are:

GLM API tokens — separate Input, Cached Input and Output rates per 1M tokens, in USD on z.ai. GLM-5.2 and GLM-5.1 at $1.4 in / $4.4 out, GLM-5 at $1 / $3.2, GLM-5-Turbo at $1.2 / $4.0, GLM-4.7 / GLM-4.6 / GLM-4.5 at $0.6 / $2.2, GLM-4.5-X at $2.2 / $8.9, GLM-4.5-Air at $0.2 / $1.1, GLM-4.7-FlashX at $0.07 / $0.4, GLM-4-32B-0414-128K at $0.1 / $0.1.
Cached input — a third, steeply discounted column: $0.01 (GLM-4.7-FlashX) to $0.45 (GLM-4.5-X), with $0.11 on the GLM-4.x flagships and $0.26 on GLM-5.2. Cached Input Storage is listed as “Limited-time Free” on every model that has it.
Free Flash tiers — GLM-4.7-Flash and GLM-4.5-Flash (text) and GLM-4.6V-Flash (vision) are Free across input, cached input, cached-input storage and output. This descends directly from Zhipu making GLM-4-Flash free in 2024.
GLM Coding Plan seats — a flat subscription listed at $18/month (Lite), $72/month (Pro) and $160/month (Max), with the billing term setting the discount: monthly −10%, quarterly −20%, yearly −30%. Quotas are stated as relative multiples (Pro = 5x Lite usage, Max = 20x Lite usage) rather than tokens.
Per-artifact and per-use meters — Web Search at $0.01 / use, images at $0.015 (GLM-Image) or $0.01 (CogView-4), videos at $0.2–$0.4 per clip, GLM-ASR-2512 speech at $0.03 / MTok, and agents at $0.7 / MTok (Slide/Poster) to $3 / MTok (translation).
RMB-native card — China developers buy on open.bigmodel.cn in yuan; that surface is geo-restricted and did not render from non-CN egress this cycle, so current RMB rates are unknown.
Enterprise / MaaS — fine-tuning, private/sovereign deployment, and the full dev kit are quoted by sales.

What makes this different: Zhipu exposes raw per-million-token billing publicly and gives three current models away free, but the paid ladder has climbed — the GLM-5 line prices above the GLM-4.x flagships and the coding subscription now carries a list price and a term-discount ladder rather than a single cheap flat fee.

Pricing by product

GLM API — text models (per 1M tokens, z.ai USD)

Model	Input	Cached input	Output	Key mechanics
GLM-5.2	$1.4	$0.26	$4.4	Latest open-source flagship; 1M-token context
GLM-5.1	$1.4	$0.26	$4.4	Long-horizon / 8-hour autonomous work
GLM-5	$1	$0.2	$3.2	General GLM-5 flagship
GLM-5-Turbo	$1.2	$0.24	$4.0	Tuned for the OpenClaw agent scenario
GLM-4.7	$0.6	$0.11	$2.2	Prior-generation coding/agent line
GLM-4.6	$0.6	$0.11	$2.2	Open-weight flagship
GLM-4.5	$0.6	$0.11	$2.2	Open-weight flagship
GLM-4.5-X	$2.2	$0.45	$8.9	Most expensive text SKU on the card
GLM-4.5-Air	$0.2	$0.03	$1.1	Lightweight, cost-sensitive default
GLM-4.5-AirX	$1.1	$0.22	$4.5	Accelerated Air variant
GLM-4.7-FlashX	$0.07	$0.01	$0.4	Cheapest paid text model
GLM-4-32B-0414-128K	$0.1	—	$0.1	Flat in/out; no cached-input line
GLM-4.7-Flash	Free	Free	Free	Free 30B-class Flash model
GLM-4.5-Flash	Free	Free	Free	Free tier; no usage cost

A fourth column on the docs price card, Cached Input Storage, is listed as “Limited-time Free” on every model that carries a cached-input rate — so context-cache retention is not billed today but is explicitly flagged as time-limited.

GLM API — vision models (per 1M tokens, z.ai USD)

Model	Input	Cached input	Output	Key mechanics
GLM-5V-Turbo	$1.2	$0.24	$4	Top vision SKU
GLM-4.6V	$0.3	$0.05	$0.9	Mid-tier multimodal
GLM-OCR	$0.03	—	$0.03	Flat in/out; docs show "" for both cached-input columns
GLM-4.6V-FlashX	$0.04	$0.004	$0.4	Cheapest paid vision model
GLM-4.5V	$0.6	$0.11	$1.8	Prior-generation vision flagship
GLM-4.6V-Flash	Free	Free	Free	Free vision tier

Built-in tools, media and agents (per artifact or per use)

SKU	Price	Unit	Key mechanics
Web Search	$0.01	per use	Metered per tool call, not per token
GLM-Image	$0.015	per image	Zhipu’s own image model
CogView-4	$0.01	per image	Cheaper image generation
CogVideoX-3	$0.2	per video	In-house video generation
Vidu Q1 (Text / Image / Start-End)	$0.4	per video	Premium video tier
Vidu 2 (Image / Start-End)	$0.2	per video	Cheaper video tier; Reference variant $0.4
GLM-ASR-2512	$0.03	per 1M tokens	Docs state “approximately $0.0024/minute”
GLM Slide/Poster Agent (beta)	$0.7	per 1M tokens	Packaged agent, token-metered
General-Purpose Translation	$3	per 1M tokens	Most expensive agent SKU
Popular Special Effects Video Templates	$0.2	per video	Template-driven short-video agent

GLM Coding Plan (flat subscription, USD)

Tier	List price	Best rate (yearly, −30%)	Included	Key mechanics
Lite	$18 / month	$12.6 / month, renewing $151.2 / year from the 2nd year	Base usage allowance; rolling access to the latest flagship models; 20+ coding tools including Claude Code	Lightweight iteration on small repos
Pro	$72 / month	$50.4 / month, renewing $604.8 / year from the 2nd year	Everything in Lite plus 5x Lite usage; curated MCP tools; priority access to new flagship models; faster generation	Marked “Popular”; day-to-day work on mid-sized repos
Max	$160 / month	$112 / month, renewing $1344 / year from the 2nd year	Everything in Pro plus 20x Lite usage; first access to new flagship models; dedicated resources at peak times	Advanced users on mid-to-large repos

The billing-term toggle sets the discount, and the discounted rate is what renews: Monthly −10% → $16.2 / $64.8 / $144 per month (page footnote: “Continuous monthly subscriptions enjoy a 10% discount”); Quarterly −20% → $14.4 / $57.6 / $128 per month, renewing $43.2 / $172.8 / $384 per quarter; Yearly −30% → $12.6 / $50.4 / $112 per month. All plans include the Vision Analysis, Web Search, Web Reader and Zread MCP tools, and a referral programme pays “up to 20% back on every purchase”.

GLM API — China-native card (RMB)

Chinese developers buy the same model family in yuan on open.bigmodel.cn. That surface is a geo-restricted SPA and timed out from non-CN egress during this capture cycle, so current RMB rates are unknown and are deliberately not restated here.

Sales motions across products: PLG / self-serve for the free Flash tiers, the pay-as-you-go API, the media and agent SKUs, and the GLM Coding Plan; sales-led for Enterprise / MaaS (private deployment, fine-tuning, sovereign hosting).

Hidden costs : What Zhipu AI users actually pay

Zhipu’s headline rates are low and public, but the real bill is shaped by three things the sticker doesn’t show: the output-token premium (output is ~3.7x input on GLM-4.6), the per-5-hour / weekly prompt quotas on the Coding Plan (not a token meter — the docs cap Lite at “~80 prompts every 5 hours”), and the currency you’re billed in (USD vs RMB). Two archetypes show how the total assembles.

Archetype 1 — a developer running a coding/RAG agent on the GLM API. Answering with GLM-4.6 at roughly 50M input + 15M output tokens/mo, with a chunk of that input served from cache.

Line item	Monthly cost
GLM-4.6 input — ~40M fresh tok @ $0.60/M	$24.00
GLM-4.6 cached input — ~10M tok @ $0.11/M	$1.10
GLM-4.6 output — ~15M tok @ $2.20/M	$33.00
Estimated total	~$58/mo

The lesson: on GLM-4.6 the $2.20/M output rate is ~3.7x the input rate, so output-heavy workloads (code generation, long answers) dominate — but caching input at $0.11/M (versus $0.60 fresh) shaves a meaningful slice off repeated context. Even so, the absolute total is low: this is a frontier-class flagship at a fraction of Western per-token rates.

Archetype 2 — a solo developer who’d otherwise pay for Claude Code. One GLM Coding Plan Lite seat instead of pay-as-you-go, using GLM-4.6 / GLM-5-Turbo inside Cline all day.

Line item	Monthly cost
GLM Coding Plan — Lite, quarterly billing −20% (billed $43.2/quarter)	$14.40
Overflow API tokens beyond quota (occasional)	~$0–$5
Estimated total	~$14–$19/mo

Here the surprise is still how little it is: the Coding Plan converts a token meter into a flat quota, and no billing term charges the $18 list price — monthly is −10% ($16.2), quarterly −20% ($14.4) and yearly −30% ($12.6). The catch is the quota shape — the docs cap Lite at “~80 prompts every 5 hours” and “~400 prompts” weekly rather than a token allowance, so a heavy agentic session can hit the ceiling before a month’s worth of value is consumed, nudging users toward Pro (~400 prompts / 5 hours) or Max (~1,600).

Want to estimate your own Zhipu AI bill? Use the Zhipu AI pricing calculator to model your costs based on token volume, cached input, and coding-seat count.

Pricing evolution : Zhipu AI pricing history and changes

Zhipu’s pricing has moved along two tracks. The API has billed per million tokens — and given away Flash tiers — since the 2024 price war; the coding side launched as a flat subscription in 2025 with an aggressive promo, then normalized in early 2026. Geopolitics and capital (Entity List, the Hong Kong IPO, and a follow-on ~$4B share sale) run underneath as a strategic, not a sticker, force. The dated milestones below are reconstructed from primary announcements and contemporaneous press.

Cadence

Quarter	Price changes	Product / SKU additions	Notes
2024 Q3	1	0	2024-08 GLM-4-Flash API made free to the public
2024 Q4	0	1	AutoGLM / GLM-PC agent products + GLM-OS concept
2025 Q1	0	0	2025-01-15 Added to the US Entity List
2025 Q3	1	1	GLM-4.5 per-token API + GLM Coding Plan launch ($3 promo)
2025 Q4	0	1	GLM-4.6 ships at the same headline rate
2026 Q1	1	0	2026-01-08 HK IPO (02513.HK); 2026-02-11 Coding Plan promo removed
2026 Q3	1	1	2026-07-08 ~US$4B HK share sale (funding, no rate change); 2026-07-22 Coding Plan repriced to $18 / $72 / $160 a month with a term-discount ladder; docs price card adds the GLM-5 line, media, audio and agent SKUs

Tracked range: 2024 Q3–2026 Q3. Quarters not listed had no publicly announced price or SKU change. Dated milestones below cite primary/secondary sources.

Notable changes

2024-08 — GLM-4-Flash API made free to the public, opening Zhipu’s free-Flash posture in China’s domestic price war (AIbase).
2024-11 — AutoGLM and GLM-PC agent products launch under a “GLM-OS” vision; CogAgent-9B base model later open-sourced.
2025-01-15 — Zhipu added to the US Entity List, the first Chinese foundation-model lab listed; it states it “strongly disagrees” (reported by SCMP, Reuters).
2025-07 — GLM-4.5 launches with open weights at $0.60 in / $2.20 out per 1M tokens (Air at $0.20 / $1.10), setting the modern USD card.
2025-09 — GLM Coding Plan launches with a $3 (Lite) / $15 (Pro) first-purchase promo, wired into Claude Code, Cline, and others (Cline/Roo Code announcements).
2025-10 — GLM-4.6 ships with a larger context window at the same $0.60 / $2.20 rate — capability gains without a price increase.
2026-01-08 — Hong Kong IPO (02513.HK): first foundation-model lab to list globally, ~US$560M raised at ~US$6.7B valuation (reported by CNBC, Futu).
2026-02-11 — All Coding Plan first-purchase discounts removed; standing prices settle at ~$10 / $30 / $80 per month (Lite/Pro/Max).
2026-07-08 — Zhipu raised ~US$4B in a Hong Kong share sale, months after its January IPO and following a reported ~1,500% share-price run-up (reported by Reuters). No pricing change accompanied it; the raise reads as runway to sustain the free-Flash and low-token-price posture — though the paid ladder actually rose two weeks later, so the capital did not translate into deeper cuts.
2026-07-22 — Coding Plan repriced to list $18 / $72 / $160 per month with a billing-term discount ladder (monthly −10%, quarterly −20%, yearly −30%); the developer docs restate quotas as ~80 / ~400 / ~1,600 prompts per 5 hours and the API price card adds the GLM-5 line plus image, video, audio and agent SKUs.

The free-Flash and coding-undercut strategy in detail

Two pricing decisions define Zhipu. First, free Flash: making GLM-4-Flash free in 2024 and keeping GLM-4.5-Flash and GLM-4.7-Flash free in 2026 (plus a 20M-token signup grant) treats the bottom of the model ladder as a customer-acquisition channel, not a revenue line. Second, the coding undercut: the GLM Coding Plan converts the token meter into a flat quota and is pitched explicitly against Claude Code and Cursor — roughly 3x Claude Pro usage at half the price for Lite. Together they are a deliberate price-war posture: give the cheap models away, sell the agentic coding workflow flat, and let the per-token API sit comfortably below Western frontier rates. The Entity-List and sovereignty context sharpens this — for buyers wary of US-dependent stacks, low price and a non-US frontier model are a combined value proposition.

What’s unique : Zhipu AI’s distinctive pricing mechanics

1. Free flagship-Flash as an acquisition channel. Zhipu doesn’t just have a free trial — it makes entire capable models (GLM-4.5-Flash, GLM-4.7-Flash) permanently free to call, with a 20-million-token grant on signup. The free tier is the funnel: it seeds the open-weight ecosystem and converts users into paid GLM-4.6 / Coding Plan customers once they outgrow Flash. Few frontier labs give away a current-generation model with no usage cost.

2. A flat coding subscription that deliberately undercuts the West. The GLM Coding Plan converts per-token inference into a flat prompt quota (~80 prompts per 5 hours on Lite, not tokens) and is priced — from $12.6/mo on yearly billing, $18/mo list — explicitly against Claude Code and Cursor. It’s a packaging arbitrage: bundle agentic coding compute into a low flat fee and win share on price, accepting thinner per-unit margin for volume.

3. Dual-currency, dual-sovereignty price cards. The same models carry a USD card on international z.ai and an RMB card on China-native open.bigmodel.cn. This isn’t just localization — being on the US Entity List and listed in Hong Kong, Zhipu turns its non-US, sovereign foundation-model status into part of the value, especially for buyers who want frontier capability outside a US-controlled stack.

4. Capability gains without price hikes. GLM-4.6 shipped with more context and better coding than GLM-4.5 at the identical $0.60 / $2.20 rate, and the Entity-List year saw price cuts (free Flash, coding promos) rather than increases. Zhipu uses falling effective price as a competitive weapon, pushing more capability through the same or lower per-million-token rate.

Strengths & weaknesses

Strengths	Weaknesses
Frontier-class GLM-4.6 at $0.60 in / $2.20 out — well below Western flagship rates	Output-token premium (~3.7x input on GLM-4.6) can surprise output-heavy workloads
Permanently free Flash models + 20M-token signup grant lower the trial barrier to zero	Coding Plan quotas are stated in “prompts per 5 hours,” not tokens — harder to predict than a meter
Open weights (GLM-4.5 / 4.6) let buyers self-host, a credible lock-in hedge	Two currency cards (USD z.ai / RMB bigmodel.cn) with differing promos add comparison friction
GLM Coding Plan undercuts Claude Code / Cursor (~3x Claude Pro usage at half price)	US Entity List status complicates procurement for US-aligned enterprises
Cached input at $0.03–$0.11/M sharply discounts repeated context	Enterprise / MaaS (private deployment, fine-tuning) is fully sales-gated — no public floor
Capability rose (GLM-4.6) at the same rate, and promos cut effective price over time	Frequent model renames (GLM-4.5 → 4.6 → 5 → 5.1) make historical price tracking harder

Billing UX : usage tracking and overage controls

Billing-period toggle with published discounts — the GLM Coding Plan page carries a Monthly −10% / Quarterly −20% / Yearly −30% segmented control. Each card shows the discounted rate struck against the $18 / $72 / $160 list price and the renewal line (“$16.2 / month from 2nd month”, “$43.2 / quarter from 2nd quarter”, “$151.2 / year from 2nd year”), so the post-promo price is visible before purchase.
Four-column docs price card — the Z.AI developer docs Pricing page breaks every model into Input · Cached Input · Cached Input Storage · Output, and separates Text, Vision, Built-in Tools, Image, Video, Audio and Agents into their own tables. Cached Input Storage is labelled “Limited-time Free”, an explicit forward warning that cache retention may become billable.
Context Caching capability page — caching is a documented API capability (under Capabilities → Context Caching), so the discounted cached-input rate is something callers opt into and control rather than an opaque discount.
Free Flash models — GLM-4.7-Flash, GLM-4.5-Flash and GLM-4.6V-Flash are Free on every billing column, letting new accounts validate a workload before any spend.
API Keys, Payment Method and Billing pages — the docs top nav exposes API Keys and Payment Method, and the z.ai footer links API Keys, Billing and FAQ, keeping key management and payment rails one click from the price card.
Coding-plan balance FAQ — the subscribe-page FAQ addresses the awkward edges directly: why error “1113 Insufficient Balance” still appears after purchasing a coding package, why account balance is still deducted, how to check whether the coding package was applied to a charge, and how to cancel auto-renewal.
Referral credit programme — “Invite friends, Earn Credits — up to 20% Bonus, No Limits” pays up to 20% back on every purchase as non-expiring credit, an in-product discount lever that sits outside the term-discount ladder.
Enterprise controls — MaaS / private deployments add fine-tuning, dedicated capacity, and sovereign hosting, quoted and managed through sales.

Strategic wins : Why Zhipu AI’s pricing decisions worked

1. Free Flash as a growth engine

Making GLM-4-Flash free in 2024 — and keeping current Flash models free in 2026 — turned the bottom of the model ladder into a customer-acquisition channel. Developers adopt the free model, build on it, and convert to GLM-4.6 or the Coding Plan when they outgrow it. Giving away a capable current-gen model is a freemium bet that distribution beats short-term inference revenue. See usage-based pricing strategy for why seeding adoption can dominate early monetization.

2. A flat coding subscription priced against the West

By converting per-token coding into a flat quarterly quota and pitching it at ~3x Claude Pro usage for half the price, Zhipu created a clean price story aimed straight at Claude Code and Cursor. The flat plan is legible to indie developers who fear token bill-shock, and the deliberate undercut anchors Zhipu as the value option — mirroring the shift away from rigid per-seat economics toward flexible, value-anchored coding pricing. Choosing the right usage metric — a flat prompt quota rather than raw tokens — is what makes the plan legible to buyers in the first place.

3. Turning sovereignty into a pricing dimension

The Entity-List listing and Hong Kong IPO made Zhipu’s non-US, sovereign status concrete. For buyers who want frontier capability outside a US-controlled stack, “cheap and sovereign” is a combined value metric — low price plus an open-weight, non-US foundation model. Zhipu monetizes that positioning the way Mistral packages EU sovereignty, as a structural reason to choose it over the incumbents.

Areas to improve : Gaps in Zhipu AI’s pricing approach

1. Translate Coding Plan quotas into tokens

The GLM Coding Plan gates usage as “~80 prompts per 5 hours” rather than a token allowance, so a buyer can’t map their actual workload to a tier without trial-and-error. A published token-equivalent — even approximate — would let developers self-select between Lite, Pro, and Max without hitting a wall mid-session. The opacity invites exactly the unpredictability a flat plan is supposed to remove.

2. Reconcile the USD and RMB cards

Running separate z.ai (USD) and bigmodel.cn (RMB) price cards with different promos forces international buyers to guess which is canonical and whether they’re getting the China-native rate. A single comparison view — or an explicit “international vs China” toggle with a stated FX basis — would cut the friction of cross-currency evaluation.

3. Expose an Enterprise / MaaS floor

Private deployment, fine-tuning, and sovereign hosting are fully sales-gated with no public anchor. A published starting price or a worked MaaS example would shorten the evaluation cycle for mid-market buyers who outgrow the Coding Plan but can’t justify a sales call. Compare how peers stage enterprise transparency.

Monetization stack & signals : how Zhipu AI builds & buys its revenue engine

Buys 1 Builds 0

The read — where the monetization investment is going

On international z.ai, Zhipu's docs point to a bought card/PayPal payment rail and a balance-then-card deduction order, but the usage meter behind its per-token API is unnamed — likely in-house for a frontier lab running its own inference. The domestic open.bigmodel.cn rail (Alipay/WeChat) is undisclosed in English, so the engine is only partly visible.

Stack — build vs buy

Buys (vendor) · 1

PayPal Payments inferred Docs Jun 2026

“If still insufficient, the remaining amount will be charged from your linked payment method (e.g., bank card or PayPal).”

Unconfirmed · 1

Metering Metering inferred Docs Jun 2026

“Per-million-token API billing with separate input/output/cached lines and a credits-then-cash-then-card deduction order implies a usage meter behind the price; no vendor is named.”

Signals reviewed Jun 2026 · derived from product docs

Key takeaways

Give the cheap models away, sell the workflow. Free Flash tiers plus a 20M-token grant turn the bottom of the ladder into a funnel; revenue lands on GLM-4.6 tokens and the flat Coding Plan. The free thing seeds adoption; the paid thing is capability and convenience.
A flat quota can be a weapon against a token meter. Packaging agentic coding as a flat per-quarter quota — pitched at 3x Claude Pro usage for half the price — wins price-sensitive developers that a pure per-token rival can’t easily match.
Push more capability through the same price. Shipping GLM-4.6 at GLM-4.5’s exact rate makes falling effective price the competitive story, without a visible discount line.
Currency and geography are pricing levers. Dual USD/RMB cards let Zhipu meet buyers where they are — and turn non-US, sovereign status into part of the value for customers wary of US-controlled stacks.
Open weights hedge the lock-in objection. Open-weighting GLM-4.5 / 4.6 lets enterprises self-host the same models they could buy as a service, so the per-token price competes on convenience rather than capture.

UBP implications

Free flagship tiers reset the floor. When a frontier lab makes a current-gen model permanently free, the priced dimension shifts up the ladder to the better model and the managed workflow. UBP practitioners should expect “free at the bottom” to become a default acquisition pattern, not a loss leader exception.
Flat quotas and token meters will coexist. Zhipu shows the same provider can sell raw tokens and a flat prompt-quota coding plan side by side. The lesson is to match the packaging to the buyer’s risk tolerance — meters for developers who want control, flat quotas for those who want predictability.
Sovereignty and currency are emerging value metrics. As geopolitics fragments the AI stack, where a model is hosted and which currency you pay in become priceable dimensions — an early signal that non-functional attributes (jurisdiction, independence) will increasingly shape AI pricing alongside tokens.

Sources

Z.ai model API pricing (USD) (accessed 2026-07-22)
Z.ai developer docs — pricing overview (accessed 2026-07-22)
GLM Coding Plan subscription (accessed 2026-07-22)
Z.ai developer docs — GLM Coding Plan overview (price and prompt quotas) (accessed 2026-07-23)
Z.ai developer docs — GLM Coding Plan FAQ (usage limits) (accessed 2026-07-23)
Z.ai developer docs — quick start (accessed 2026-07-23)
open.bigmodel.cn pricing (China, RMB) (accessed 2026-07-22)
AIbase — GLM-4-Flash API free to the public (2024-08) (accessed 2026-07-22)
CNBC — China AI tiger Zhipu Hong Kong debut (2026-01) (accessed 2026-07-22)
SCMP — US adds Zhipu to trade blacklist (2025-01) (accessed 2026-07-22)
Browse the pricing blueprint corpus

Bottom line

Zhipu AI (Z.ai) prices a frontier GLM model family below the Western field: GLM-4.6 at $0.60 in / $2.20 out per million tokens, free Flash tiers with a 20M-token grant, and a flat GLM Coding Plan listed at $18/mo (as low as $12.6/mo on yearly billing) that openly undercuts Claude Code. Open weights hedge lock-in, dual USD/RMB cards meet buyers by geography, and capability has risen (GLM-4.6) without a price hike. The friction is quota opacity on the coding plan and sales-gated enterprise — but as the first foundation-model lab to IPO globally, and a US-Entity-Listed Tsinghua spin-out, Zhipu makes cheap-and-sovereign a single value proposition.

Want to compare Zhipu AI against other foundation-model providers? See Mistral AI and OpenAI, or browse the full pricing blueprint.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

GLM Coding Plan repriced to $18/$72/$160 and the API card expands to GLM-5.2

Jul 2026

The GLM Coding Plan moves from a quarterly ~$10/$30/$80-a-month card to list prices of $18 (Lite), $72 (Pro) and $160 (Max) per month, with a billing-term discount ladder — monthly −10% ($16.2/$64.8/$144), quarterly −20% ($14.4/$57.6/$128), yearly −30% ($12.6/$50.4/$112). Quotas are restated as relative multiples (Pro = 5x Lite usage, Max = 20x Lite usage) instead of prompts-per-5-hours. In parallel the docs price card adds a premium top end — GLM-5.2 and GLM-5.1 at $1.4 in / $4.4 out, GLM-5 at $1 / $3.2, GLM-5-Turbo at $1.2 / $4.0, GLM-4.5-X at $2.2 / $8.9 — plus per-image, per-video, per-use web search and per-agent SKUs.

captured 2026-07-22

Live snapshot: dual USD/RMB cards + free Flash tiers

Jun 2026

Captured live: GLM-4.6/4.5 at $0.60 in / $2.20 out (cached $0.11), GLM-4.5-Air $0.20 / $1.10, GLM-4.5V $0.60 / $1.80, GLM-4-32B $0.10 / $0.10, GLM-4.5-Flash & GLM-4.7-Flash free, 20M-token signup grant; China-native bigmodel.cn shows RMB equivalents (GLM-4.5 0.8/2 yuan per 1M); GLM Coding Plan ~$10/$30/$80/mo.

Coding Plan first-purchase discounts removed

Feb 2026

Zhipu removes all GLM Coding Plan first-purchase discounts (ending the $3/$15 promo). Standing prices settle at roughly Lite $30/quarter (~$10/mo), Pro $90/quarter (~$30/mo), Max $240/quarter (~$80/mo). (Source: vibecoding.app, z.ai, 2026-02.)

Hong Kong IPO — first foundation-model lab to list globally

Jan 2026

Zhipu (02513.HK) debuts on the Hong Kong Stock Exchange, becoming the first foundation-model AI company to go public globally. It raised roughly HK$4.3B (about US$560M) at a ~US$6.7B valuation; the HK retail tranche was oversubscribed ~1,159x. (Source: CNBC, Futu, 2026-01.)

GLM-4.6 released at the same headline rate

Oct 2025

GLM-4.6 ships with a larger context window and improved coding/agentic performance, priced at the same $0.60 in / $2.20 out per 1M tokens as GLM-4.5 — capability gains delivered without a price increase. (Source: z.ai docs, OpenRouter, 2025-10.)

GLM Coding Plan launches with a $3 promo

Sep 2025

Zhipu launches the GLM Coding Plan — a flat coding subscription wired into Claude Code, Cline, Kilo Code and others — with an aggressive first-purchase promo of $3 (Lite) and $15 (Pro), explicitly positioned to undercut Claude Code and Cursor. (Source: z.ai, Cline/Roo Code announcements, 2025-09.)

GLM-4.5 launches with open weights and per-token API

Jul 2025

Zhipu releases GLM-4.5 (and GLM-4.5-Air) with open weights under a permissive license, listed on z.ai at $0.60 in / $2.20 out per 1M tokens (Air at $0.20 / $1.10). Establishes the modern USD price card. (Source: z.ai, Zhipu blog, 2025.)

Added to the US Entity List

Jan 2025

The US Commerce Department adds Zhipu (Beijing Zhipu Huazhang) to the BIS Entity List — the first Chinese foundation-model lab listed. Zhipu states it 'strongly disagrees.' The listing sharpens Zhipu's sovereign-AI, no-US-dependency positioning rather than its sticker prices. (Source: SCMP, Reuters, 2025-01.)

AutoGLM / GLM-PC agent products and GLM-OS concept launch

Nov 2024

Zhipu introduces AutoGLM and GLM-PC under a 'GLM-OS' vision and later open-sources the CogAgent-9B base model, widening the product surface from raw API into agent products that later anchor the GLM Coding Plan. (Source: Zhipu newsroom, 2024-11/12.)

GLM-4-Flash API made free to the public

Aug 2024

Zhipu opens its GLM-4-Flash API for free to all developers, an aggressive move in China's domestic model price war. This established the 'free Flash' anchor that persists in 2026 (GLM-4.5-Flash and GLM-4.7-Flash remain free). (Source: AIbase, 2024-08.)

Trivia

· Zhipu spun out of Tsinghua University's Knowledge Engineering Group in 2019 and is one of China's 'AI tigers' alongside Moonshot, MiniMax, Baichuan and 01.AI.
· In January 2025 Zhipu became the first Chinese foundation-model lab added to the US Commerce Department's Entity List — it said it 'strongly disagrees' and stressed it relies on no US large-model technology.
· On 8 January 2026 Zhipu (02513.HK) became the first foundation-model AI company to IPO anywhere in the world, raising about US$560M at a ~US$6.7B valuation on the Hong Kong exchange.

Questions & answers

What is Zhipu AI's pricing model?: Zhipu (Z.ai) runs a per-token GLM API plus a flat coding subscription. API rates are per million tokens — GLM-5.2 at $1.4 input / $4.4 output, GLM-4.7 / GLM-4.6 / GLM-4.5 at $0.6 / $2.2, GLM-4.5-Air at $0.2 / $1.1 — while the GLM Coding Plan is a fixed monthly fee listed at $18 to $160.
Does Zhipu AI offer a free tier?: Yes. GLM-4.7-Flash and GLM-4.5-Flash (text) and GLM-4.6V-Flash (vision) are free to call with no usage cost across input, cached input and output. Zhipu first made its GLM-4-Flash API free to the public in August 2024.
How much does the GLM Coding Plan cost?: List prices are $18/month (Lite), $72/month (Pro) and $160/month (Max). Billing term sets the discount: monthly −10% ($16.2 / $64.8 / $144), quarterly −20% ($14.4 / $57.6 / $128) and yearly −30% ($12.6 / $50.4 / $112 per month).
How much does the Zhipu GLM API cost per million tokens?: On z.ai (USD): GLM-5.2 and GLM-5.1 are $1.4 in / $4.4 out, GLM-5 is $1 / $3.2, GLM-5-Turbo is $1.2 / $4.0, GLM-4.7 / GLM-4.6 / GLM-4.5 are $0.6 / $2.2, GLM-4.5-Air is $0.2 / $1.1, and GLM-4-32B-0414-128K is $0.1 / $0.1. Cached input runs $0.01–$0.45.
Does Zhipu charge for anything besides tokens?: Yes. Web Search is $0.01 per use, images are $0.015 (GLM-Image) or $0.01 (CogView-4) each, video generation is $0.2–$0.4 per video, GLM-ASR-2512 speech is $0.03 per 1M tokens, and agents run $0.7 per 1M tokens (Slide/Poster) to $3 per 1M tokens (translation).
Can I self-host Zhipu's GLM models?: Yes for the open-weight ones. Zhipu open-weighted GLM-4.5 and GLM-4.6 under permissive (MIT-style) licenses, so you can download and run them yourself, while Zhipu monetizes hosted inference per token and the managed coding subscription.