How much does the Gemini API cost per million tokens?

Prices range from $0.075/1M input tokens (Gemini 2.0 Flash Lite) to $2/1M input tokens (Gemini 3.1 Pro Preview). Output tokens are typically 4–6× input prices. Gemini 2.5 Pro costs $1.25 input / $10 output per 1M tokens (standard tier, ≤200K context).

Does Google Gemini have a free tier for API access?

Yes. AI Studio offers a free tier with rate-limited access to Gemini 2.0 Flash and 2.5 Flash — no credit card required. Free limits are approximately 15 RPM and 1M TPM for Flash models. This free tier is not available on Vertex AI.

What is the difference between Gemini API on AI Studio vs Vertex AI?

AI Studio provides a simpler developer onboarding with a free tier and lower operational overhead. Vertex AI adds enterprise features: VPC Service Controls, dedicated quotas, SLA guarantees, Priority and Flex/Batch tiers, fine-tuning, and full Google Cloud billing integration. Pricing is the same at standard rates.

What is the Gemini context caching discount?

Cached input tokens cost 10% of the standard input rate — a 90% discount. For example, Gemini 2.5 Pro cached input costs $0.125/1M tokens vs $1.25/1M for standard input. This significantly reduces costs for applications that reuse long system prompts or documents across many queries.

What is the Flex/Batch API and how much does it save?

The Flex/Batch API allows asynchronous, non-time-critical inference at 50% of the standard price. For example, Gemini 2.5 Pro drops from $1.25/$10 to $0.625/$5 per 1M input/output tokens. Ideal for data processing pipelines and offline workloads that can tolerate latency.

What does Google AI Pro include and cost?

Google AI Pro (renamed from Google One AI Premium at I/O 2025) costs $19.99/month and includes the consumer Gemini app with higher usage limits, 5TB of Google Drive/Gmail/Photos storage, and Gemini in Google Workspace apps. As of July 2026 a cheaper Google AI Plus tier sits below it (400GB storage, 2× free-tier limits and Pro-model access; priced at ₹399/month in India, with the USD price not yet verified). Google AI Ultra starts at $100/month (with a $200/month premium tier, reduced from $249.99 at I/O 2026) and adds 20TB storage and higher limits. All are consumer subscriptions unrelated to the developer API pricing.

Google Gemini Pricing

AI Summary

Google Gemini API operates on a pure pay-per-token model with no subscription or seat fee — developers pay only for input and output tokens consumed, billed through Google Cloud.
The Gemini API offers a free tier via AI Studio (no credit card required) with rate-limited access to Gemini 2.0 Flash and 2.5 Flash models; paid usage kicks in when free quotas are exceeded.
Vertex AI hosts the full Gemini model family — from Gemini 2.0 Flash Lite ($0.075/$0.30 per 1M input/output tokens) up to Gemini 3.1 Pro Preview ($2/$12 per 1M tokens standard) — with Standard, Priority (1.8×), and Flex/Batch (0.5×) pricing tiers. In July 2026 Google added AlphaEvolve, DeepMind's evolutionary coding agent, as a separately-billed Vertex AI SKU priced at the underlying Gemini model rate plus a 2× agent surcharge (3× the standard model price all-in): a Gemini 3.1 Pro Preview AlphaEvolve run costs $6/1M input and $36/1M output, making it Google's first agent priced as an explicit multiplier on model tokens. Vertex AI hosts the full Gemini model family — from Gemini 2.0 Flash Lite ($0.075/$0.30 per 1M input/output tokens) up to Gemini 3.1 Pro Preview ($2/$12 per 1M tokens standard) — with Standard, Priority (1.8×), and Flex/Batch (0.5×) pricing tiers. In July 2026 Google added AlphaEvolve, DeepMind's evolutionary coding agent, as a separately-billed Vertex AI SKU priced at the underlying Gemini model rate plus a 2× agent surcharge (3× the standard model price all-in): a Gemini 3.1 Pro Preview AlphaEvolve run costs $6/1M input and $36/1M output, making it Google's first agent priced as an explicit multiplier on model tokens.
Context caching provides a 90% discount on cached input tokens, making long-context applications significantly cheaper; a long-context surcharge applies when total input exceeds 200K tokens.
Grounding with Google Search adds per-query costs ($35/1K prompts for Gemini 2.5/2.0 beyond free daily limits; $14/1K for Gemini 3 models) — a hidden dimension missing from basic token-cost estimates.
On the consumer side, Google added a new entry-level Google AI Plus tier in July 2026 (400GB storage, 2× free-tier usage limits, Pro-model access; captured in India at ₹399/month with the USD price not yet verified) below Google AI Pro ($19.99/month, formerly Google One AI Premium, 5TB storage) and Google AI Ultra (from $100/month, with a $200/month premium tier, 20TB storage and Deep Think) — all separate consumer products from the developer API, now forming a four-rung paid ladder. On the consumer side, Google added a new entry-level Google AI Plus tier in July 2026 (400GB storage, 2× free-tier usage limits, Pro-model access; captured in India at ₹399/month with the USD price not yet verified) below Google AI Pro ($19.99/month, formerly Google One AI Premium, 5TB storage) and Google AI Ultra (from $100/month, with a $200/month premium tier, 20TB storage and Deep Think) — all separate consumer products from the developer API, now forming a four-rung paid ladder.

Pricing summary

Google Gemini 2026 — API & Consumer Pricing

Free tier via AI Studio; paid API from $0.075/1M tokens; Priority (1.8×) and Flex/Batch (0.5×) on Vertex AI

AI Studio Free Tier

Free

Developers prototyping, students, hobbyists

Most developer choice

Gemini API Pay-as-you-go

From $0.075 /1M tokens

Production apps, startups, API-first teams

Guaranteed capacity

Vertex AI Priority

1.8× standard rate

High-throughput production workloads

Consumer subscription

Google AI Pro

$19.99 /mo

Non-developers: knowledge workers, students

Flex / Batch API

0.5× standard rate

Offline pipelines, data processing, bulk evaluation

Tuned Models

1.5× base model rate

Teams fine-tuning Gemini for domain-specific use

Gemini API prices shown are standard Vertex AI rates. AI Studio free tier has rate limits and may use data for model improvement. API/Vertex prices verified July 2026 (unchanged); consumer USD figures per validated May 2026 source.

About

Google Gemini is Google’s flagship generative AI model family, developed by Google DeepMind and accessible to developers through the Gemini API (via AI Studio) and Google Cloud Vertex AI. Launched in December 2023 as a replacement for the PaLM 2 and Bard-era models, Gemini is Google’s direct response to OpenAI’s GPT-4 and Anthropic’s Claude — a multimodal model family capable of processing text, images, audio, video, and code in a single unified architecture.

The Gemini family spans multiple capability tiers: Lite models for cost-efficient inference, Flash models for speed-and-cost balance, and Pro models for maximum quality. As of mid-2026, the Gemini 3 series (3.1 Pro, 3.5 Flash, 3 Flash, 3.1 Flash-Lite) represents the current frontier, alongside the Gemini 2.5 and 2.0 families which remain in production use.

Google’s AI revenue is integrated into Google Cloud, which reported $43B in revenue in 2025 (up 28% YoY), with Gemini API and Vertex AI contributing meaningfully to that growth. The consumer Gemini app (formerly Bard) is monetized through the Google One AI Premium subscription at $19.99/month, bundled with Google Workspace AI features.

Google’s market position is unique: it controls the entire stack from TPU hardware to foundation models to the consumer app surface, and leverages Search and Workspace distribution that no competitor can match. This vertical integration creates a pricing dynamic where developer API pricing is often set below cost to drive platform adoption, while the real monetization occurs through enterprise Google Cloud and Workspace relationships. Compare this approach to Perplexity AI’s freemium + search API model for a contrasting pure-play AI monetization strategy.

Pricing summary : pure pay-per-token across a three-tier speed/quality ladder

Google Gemini’s developer pricing is pure usage-based with no subscription or seat fee — you pay only for tokens consumed, billed through Google Cloud. The pricing architecture has three access modes: a free tier on AI Studio (rate-limited, no credit card), standard pay-as-you-go on Vertex AI, and enterprise-grade access with Priority throughput guarantees or Flex/Batch discounts.

The model portfolio forms a deliberate cost ladder. Gemini 3.1 Flash-Lite at $0.25/$1.50 per 1M input/output tokens is Google’s cheapest production model — suited for classification, summarization, and routing tasks. Gemini 2.5 Flash at $0.30/$2.50 offers a better quality/cost balance for general reasoning. Gemini 2.5 Pro at $1.25/$10 is the power tier for complex tasks. Gemini 3.1 Pro Preview at $2/$12 is Google’s current frontier, still in preview as of May 2026.

What makes this different: Google’s context caching mechanic is structurally different from competitors: cached input costs just 10% of the standard rate, creating a 90% discount on repeated context. For applications with long, reused system prompts or documents, this single feature can reduce per-query costs by 40–70% at scale. No other major AI provider offers a comparable caching discount depth. This aligns with the principles in understanding usage-based pricing models — pricing dimensions that map directly to underlying resource consumption.

Pricing by product

Gemini API: Current model pricing (Standard tier, pay-as-you-go)

Model	Input ≤200K (per 1M)	Input >200K (per 1M)	Cached Input (per 1M)	Output (per 1M)	Best for
Gemini 3.1 Pro Preview	$2.00	$4.00	$0.20	$12.00	Frontier quality, complex reasoning
Gemini 3.5 Flash	$1.50	$1.50	$0.15	$9.00	High-quality speed-optimized
Gemini 3 Flash Preview	$0.50	$0.50	$0.05	$3.00	Balanced cost/quality
Gemini 3.1 Flash-Lite	$0.25	$0.25	$0.025	$1.50	High-volume, cost-first
Gemini 2.5 Pro	$1.25	$2.50	$0.125	$10.00	Best current production quality
Gemini 2.5 Flash	$0.30	$0.30	$0.03	$2.50	Developer default (price/perf)
Gemini 2.5 Flash-Lite	$0.10	$0.10	$0.01	$0.40	Lowest-cost production tier
Gemini 2.0 Flash	$0.10	$0.10	$0.025	$0.40	Stable production (deprecated — shut down June 1, 2026)
Gemini 2.0 Flash-Lite	$0.075	$0.075	—	$0.30	Entry-level production (deprecated — shut down June 1, 2026)

Vertex AI pricing tiers (multipliers applied to standard rate)

Tier	Multiplier	Use case	Availability
Standard	1.0×	Default pay-as-you-go	All models
Priority	1.8×	Guaranteed capacity, low latency SLA	Gemini 2.5, 3.x
Flex/Batch	0.5×	Async / offline workloads	Gemini 2.0, 2.5, 3.x

AlphaEvolve agent (Vertex AI — new July 2026)

AlphaEvolve, Google DeepMind’s evolutionary coding/optimization agent, appeared on the Vertex AI pricing page in July 2026 as a separately-billed agentic SKU. Its cost is the underlying Gemini model rate plus an AlphaEvolve agent surcharge (the agent charge is 2× the base model rate, so the all-in total is 3× the standard model price):

Underlying model	Direction	Gemini model (per 1M)	AlphaEvolve agent (per 1M)	Total (per 1M)
Gemini 3.1 Pro Preview	Input	$2.00	$4.00	$6.00
Gemini 3.1 Pro Preview	Output + thinking	$12.00	$24.00	$36.00
Gemini 3.5 Flash	Input	$1.50	$3.00	$4.50
Gemini 3.5 Flash	Output + thinking	$9.00	$18.00	$27.00

Grounding and multimodal add-ons

Feature	Free allowance	Paid rate
Grounding with Google Search (Gemini 2.5/2.0 Flash)	1,500 queries/day	$35/1K queries
Grounding with Google Search (Gemini 2.5/2.0 Pro)	10,000 queries/day	$35/1K queries
Grounding with Google Search (Gemini 3.x)	5,000 queries/month	$14/1K queries
Web Grounding for Enterprise	None listed	$45/1K prompts
Grounding with Google Maps	5,000 queries/month	$14–$25/1K queries
Grounding with Your Data	None listed	$2.50/1K prompts

Consumer access (non-developer)

Product	Price	Includes
Gemini App (basic)	Free	Latest Gemini Flash model in consumer chat; limited to web/mobile app
Google AI Plus	USD unknown (₹399/mo India)	New entry paid tier (below Pro): 400 GB storage, 2× higher usage limits vs free, Omni in Gemini, AI Inbox in Gmail (rolling out), access to Pro model. USD figure renders via JS/images and was not captured this run
Google AI Pro	$19.99/month	Consumer Gemini app with 4× higher limits, 5TB storage, Gemini in Workspace, YouTube Premium Lite, $10/mo Google Cloud credits (renamed from Google One AI Premium)
Google AI Ultra (entry)	$100/month	5× higher Gemini/Antigravity limits vs Pro, 20TB storage, YouTube Premium individual, Deep Think, $40/mo Google Cloud credits (new tier introduced at I/O 2026)
Google AI Ultra (premium)	$200/month	20× higher Gemini/Antigravity limits vs Pro, 30TB storage, Project Genie access, $100/mo Google Cloud credits (reduced from $249.99 at I/O 2026)

Note: the consumer Google AI plan page (one.google.com) rendered in India/INR locale on the 2026-07-06 automated capture; USD subscription figures render via JS/images and are validated separately (blog.google + press, capture 2026-05-30). The Google AI Plus tier is newly present on the plan page as of this run — its USD price is not yet verified and is left unknown rather than converted from INR.

Sales motions across products: Pure self-serve PLG for AI Studio and Gemini API (pay-as-you-go); sales-led for Vertex AI enterprise commitments, dedicated capacity, and Google Workspace enterprise contracts. Consumer Google AI (One) plans are self-serve subscription.

Hidden costs : what surprises Gemini API buyers beyond base token rates

Archetype A: Developer building a RAG chatbot on Gemini 2.5 Pro

A team building a document Q&A application, processing 1M queries/month with a 50K-token system prompt sent fresh each query:

Line item	Monthly cost
Gemini 2.5 Pro — input tokens (1M queries × 50K tokens each)	$62,500
Gemini 2.5 Pro — output tokens (avg 500 tokens per query)	$5,000
Grounding with Google Search (1M queries, paid tier)	$35,000
Estimated total (no caching)	~$102,500

With context caching enabled (same 50K prompt, cached after first call each session):

Line item	Monthly cost
Gemini 2.5 Pro — cached input (50K × 1M at $0.125/1M)	$6,250
Gemini 2.5 Pro — output tokens	$5,000
Grounding with Google Search	$35,000
Estimated total (with caching)	~$46,250

Context caching alone saves ~54% — but grounding costs are additive and can dominate the bill. The Google Gemini pricing calculator can model your specific token volumes and caching ratios.

Archetype B: Startup on AI Studio free tier going to production

Hidden cost	Impact
Free tier rate limits (~15 RPM)	Forces a move to paid Vertex AI when traffic grows — costs jump from $0 to metered instantly
Long-context surcharge (>200K tokens)	If any input exceeds 200K, all tokens in that request are charged at the higher rate — not just the overflow
Grounding with Google Search	Not included in base token pricing; 1,500–10,000 free queries/day then billed separately
Non-global endpoint premium (July 2026)	Gemini 3 models accessed from regional endpoints carry a 10% surcharge vs global endpoints
Image token calculation complexity	Image input tokens vary by resolution (560–2,000+ tokens per image); audio at separate per-token rates

Use the Google Gemini pricing calculator to model your expected monthly spend before moving a prototype into production — the gap between free tier and first paid bill can be significant.

Pricing evolution : how Gemini pricing has changed since launch

Cadence

Quarter	Price changes	Product / SKU additions	Notes
2023 Q4	0	1	Gemini Pro API launched free; Gemini 1.0 Ultra in Bard
2024 Q1	0	1	Google One AI Premium ($19.99/mo) launched with Gemini Advanced
2024 Q2	0	2	Gemini 1.5 Pro and 1.5 Flash launched at Google I/O; 1M token context
2024 Q4	2	0	Gemini 1.5 Pro price cut ~65% ($3.50→$1.25 input); 1.5 Flash cut ~50%
2025 Q1	0	2	Gemini 2.0 Flash ($0.15/$0.60) and Flash-Lite ($0.075/$0.30) launched
2025 Q2	0	2	Gemini 2.5 Pro and 2.5 Flash launched at Google I/O 2025; Priority/Batch tiers
2026 Q2	0	4	Gemini 3 family launched (3.1 Pro, 3.5 Flash, 3 Flash, 3.1 Flash-Lite); regional pricing introduced
2026 Q3	0	2	Google AI Plus consumer tier added below Pro; 2.0 Flash/Flash-Lite shut down (June 1); Imagen 4 / Veo 3 / Veo 2 deprecated; AlphaEvolve agent SKU added to Vertex AI (model rate + 2× agent surcharge)

Tracked range: 2023 Q4–2026 Q3. Quarters not listed above were verified stable with no price changes.

Notable changes

2023-12-13 — Gemini 1.0 Pro API launched free via Google AI Studio; Gemini Ultra available only through Google One AI Premium ($19.99/mo). No pay-as-you-go API at launch.
2024-02-15 — Google One AI Premium launched at $19.99/month with Gemini Advanced (Ultra-class model), marking Google’s first consumer AI subscription and the first Gemini monetization event.
2024-05-14 — Google I/O: Gemini 1.5 Pro ($3.50/$10.50 per 1M input/output) and Gemini 1.5 Flash ($0.35/$1.05 per 1M) launched with 1M-token context windows. Both available via Gemini API with free tier on AI Studio.
2024-11-19 — Aggressive price cuts: Gemini 1.5 Pro input dropped from $3.50 to $1.25/1M (64% cut); output from $10.50 to $5/1M (52% cut). Gemini 1.5 Flash input from $0.075 to $0.0375/1M (50% cut). This directly responded to GPT-4o and Claude 3.5 Sonnet competitive pressure and aligned with broader AI token cost deflation trends across the market.
2025-01-15 — Gemini 2.0 Flash launched at $0.15/$0.60 per 1M input/output with native multimodal capabilities (audio output, image generation). Flash-Lite at $0.075/$0.30 as the entry-level option. Batch API discount introduced at 50%.
2025-05-20 — Gemini 2.5 Pro ($1.25/$10) and 2.5 Flash ($0.30/$2.50) launched with improved reasoning and Priority (1.8×) and Flex/Batch (0.5×) pricing tiers formally introduced as named SKUs on Vertex AI.
2026-04-01 — Gemini 3 model family launched with image-native generation models (Gemini 3 Pro Image, 3.1 Flash Image). Regional endpoint pricing announced effective July 1, 2026 — first geographic pricing split for Gemini.
2026-07-06 — Consumer packaging move: a new Google AI Plus tier appeared below AI Pro (400 GB storage, 2× free-tier usage limits, Pro-model access), captured only in INR (₹399/mo) with the USD price still unverified. On the developer side there was no price change — API/Vertex token rates were verified unchanged vs 2026-05-29 — but the legacy entry-level models were retired: Gemini 2.0 Flash and 2.0 Flash-Lite reached end-of-life (shut down June 1, 2026), moving the production cost floor up to Gemini 2.5 Flash-Lite ($0.10/$0.40 per 1M). Imagen 4, Veo 3 and Veo 2 were newly flagged deprecated with 2026 shut-down dates.
2026-07-14 — First agentic SKU on the Vertex AI price sheet: AlphaEvolve (DeepMind’s evolutionary coding/optimization agent) is now billed as the underlying Gemini model rate plus a 2× agent surcharge — 3× the standard model price all-in (e.g. Gemini 3.1 Pro Preview at $6/1M input, $36/1M output; Gemini 3.5 Flash at $4.50/$27). This is a pricing-mechanic change, not a price cut or hike: headline Gemini API/Vertex token rates and the consumer AI Plus/Pro/Ultra lineup were all verified unchanged vs 2026-07-06. It is the first time Google prices an agent as a distinct metered line item layered on model tokens, rather than folding agentic scaffolding into the base rate.

What’s unique : differentiators in Gemini’s pricing mechanics

1. 90% context caching discount — the deepest in the market. Google’s context caching gives a 90% discount on cached input tokens (10% of standard rate). Competitors like Anthropic offer prompt caching at ~90% discount too, but Google’s implementation is available across more model tiers and integrates directly with the 200K long-context boundary. For a real-world developer sending a 100K-token knowledge base with every query, caching turns Gemini 2.5 Pro from a $1.25/1M cost to $0.125/1M on those tokens — a difference that materially changes unit economics for AI agent workflows and RAG architectures.

2. Three throughput tiers priced explicitly: Standard / Priority / Flex. Google is unique in pricing throughput guarantees as explicit rate multipliers (1.8× for Priority, 0.5× for Flex/Batch). Most competitors hide throughput guarantees inside enterprise contracts or tier-based rate limits. Making the throughput economics visible and self-serve allows engineering teams to make price-vs-latency tradeoffs without a sales call — a practice consistent with choosing the right usage metric for production AI infrastructure.

3. Long-context all-or-nothing pricing cliff at 200K tokens. When a request exceeds 200K input tokens, Google charges all tokens (both input and output) at the higher long-context rate — not just the overflow. This is a non-obvious pricing mechanic: a 201K-token request costs materially more than a 199K-token request even though only 1,000 tokens crossed the threshold. This creates a strong incentive to chunk or truncate context at the boundary, and represents a meaningful hidden cost for unprepared teams.

4. Free tier via AI Studio with no credit card. Google provides free access to Gemini 2.0 Flash and 2.5 Flash through AI Studio with no payment method required — a developer acquisition strategy that no major competitor fully matches at the same breadth. This free-to-paid pathway is a key part of how AI companies shift from per-user to usage models, using free utility to create developer stickiness before introducing paid tiers.

5. Grounding with Google Search as a separately billed capability. Unlike most LLM APIs where web retrieval is either included or not supported, Google offers Grounding with Google Search as an additive per-query charge. This is the only AI API on the market where you can buy “real-time internet access” on a usage-based basis. The business logic is clear: Google Search is a premium monetizable asset, and attaching it to AI queries creates a secondary revenue stream alongside token costs.

6. Agent-as-multiplier pricing: the AlphaEvolve surcharge (new July 2026). With the AlphaEvolve SKU (added 2026-07-14), Google prices an agent not as a flat seat or an outcome fee but as a transparent multiple of the underlying model rate — a 2× agent surcharge on top of standard tokens, so the all-in cost is exactly 3× the base model price ($6/$36 per 1M on Gemini 3.1 Pro Preview, $4.50/$27 on Gemini 3.5 Flash). This makes the agent’s marginal cost fully legible: a buyer already modeling Gemini token spend can price an AlphaEvolve run by simply tripling it. It mirrors the same “value expressed as a rate multiplier” logic Google already uses for Priority (1.8×) and Flex/Batch (0.5×) throughput, now extended up the stack from how fast tokens are served to what agentic work is done with them — and it keeps agent monetization on the same usage-based rails as the rest of the platform rather than moving to a separate contract.

Strengths & weaknesses

Strengths	Weaknesses
90% context caching discount dramatically reduces costs for long-context repeated queries	Long-context surcharge (>200K) applies to ALL tokens, not just overflow — a non-intuitive pricing cliff
Free tier via AI Studio with no credit card removes adoption friction for developers globally	No volume discounts published; enterprise pricing requires Google Cloud contracts, not self-serve
Explicit Priority (1.8×) and Flex/Batch (0.5×) tiers let teams make throughput tradeoffs without sales calls	Grounding with Google Search is separately billed — easy to miss in initial cost estimates
Broadest model portfolio: 8+ Gemini models plus Gemma, covering every cost/quality tradeoff point	Regional endpoint pricing (July 2026) adds 10% to Gemini 3 costs outside global endpoints — a surprise for non-global deployments
Native multimodal: text, image, audio, video in one API with unified token billing	Image token calculation varies by resolution (560–2,000+ tokens per image) — hard to predict costs without the countTokens API
Batch API at 50% discount is broadly available and well-documented	AI Studio free tier data may be used for model improvement — a compliance concern for sensitive data use cases
AlphaEvolve agent (July 2026) is priced transparently as a fixed multiple of the model rate, so its marginal cost is easy to model	The AlphaEvolve agent surcharge is steep — 2× the model rate on top of tokens (3× all-in) with no published outcome guarantee, so agentic runs cost triple the raw inference and must clear a high ROI bar

Billing UX : developer experience with Gemini API billing controls

Billing via Google Cloud — All Gemini API paid usage is billed through a Google Cloud project. Developers need a GCP account with billing enabled; there is no standalone Gemini API billing portal separate from Cloud.
AI Studio free tier — Available without a credit card or GCP billing account. Rate limits (approximately 15 RPM for Flash models) apply; exceeding them returns 429 errors rather than billing overages.
Usage dashboard — Google Cloud Console provides token usage, request counts, and cost breakdown by model and project. Granularity is at the hourly level; real-time spend is visible with slight delay.
Budget alerts — GCP budget alerts allow teams to set monthly spend thresholds with email/Pub-Sub notifications. Spend caps are advisory (not hard limits by default); hard spend caps require custom quota limits set separately.
Quota management — Default quotas are set per-project per-model. Increasing quotas requires a quota request in the GCP Console — self-serve up to a threshold, then Google Cloud support review for higher limits.
Batch API billing — Batch requests are queued and billed at 50% of standard rates upon completion. No real-time billing; costs appear after batch job finishes.
Context caching billing — Cached context is billed at creation time (full token cost), then subsequent cache hits at 10% of normal input rate. Cache TTL is configurable (default varies by model). Stale caches that expire before sufficient queries arrive do not pay back their creation cost — a minor but real UX friction.
Vertex AI commitment discounts — Committed Use Discounts (CUDs) are available on Vertex AI for predictable workloads but require annual commitment contracts and are negotiated through Google Cloud sales rather than self-serve.
Payment methods — Credit card, bank account, invoicing (enterprise). Enterprise Google Cloud customers may use consolidated invoicing across all Cloud services.

Strategic wins : where Google Gemini’s pricing decisions have excelled

1. The November 2024 price cuts reframed Gemini’s competitive position

When Google cut Gemini 1.5 Pro pricing by 64% in November 2024, it was not just a defensive response to GPT-4o — it was a market-defining signal. At $1.25/1M input tokens, Gemini 1.5 Pro undercut OpenAI’s GPT-4o ($2.50/1M input) by 50% while offering comparable quality and a larger context window. The cut triggered a pricing reset across the AI market, with Anthropic and OpenAI both reducing prices within months. Google’s willingness to sacrifice margin on API pricing reflects its core strategic bet: make the Gemini ecosystem so cheap that every developer defaults to Gemini, generating long-term lock-in via Google Cloud and Workspace. See AI companies shifting from per-user licenses for why platform adoption is the real prize.

2. The 90% caching discount creates a usage-based loyalty mechanism

Context caching is both a genuine technical feature and a brilliant pricing strategy. Developers who architect applications around Gemini’s caching system — sending long documents, system prompts, or few-shot examples once and reusing them across queries — see dramatic cost reductions that are specific to Gemini’s implementation. This creates switching costs that are invisible in a benchmark comparison: a production application optimized for Gemini caching would need re-engineering to migrate to an API without equivalent discounts. It is a usage-based pricing mechanic that creates loyalty without any contractual lock-in.

3. AI Studio’s no-credit-card free tier dominates developer acquisition

By offering a genuinely useful free tier with no payment method required, Google captures developer attention at the earliest possible moment in the decision journey. A student in India, a startup founder in Nigeria, or a researcher at a university can start building with Gemini 2.5 Flash today without a GCP account or credit card. This zero-friction acquisition strategy has made Gemini the default experimentation platform for a generation of developers who later bring it into their organizations. The PLG model applied to API products works when the free tier is genuinely useful — and Gemini 2.5 Flash free is genuinely useful.

4. Explicit throughput pricing (Priority / Flex) removes enterprise friction

By publishing Priority (1.8×) and Flex/Batch (0.5×) pricing as explicit, self-serve options rather than burying them in enterprise contracts, Google lets engineering teams make real-time economic decisions about throughput tradeoffs. A team running nightly data enrichment can switch to Flex/Batch and cut costs 50% with a single API parameter change — no procurement, no negotiation. This designing value-based pricing approach creates a natural upsell: teams that adopt Flex/Batch for batch workloads naturally consider Priority for latency-critical paths, expanding total spend with Google.

5. The July 2026 Google AI Plus tier lengthens the consumer paid ladder downward

The developer API is Google’s platform-adoption engine, but the consumer Gemini app is where Google harvests direct subscription revenue — and the Google AI Plus tier added on 2026-07-06 sharpens that ladder. By slotting a cheaper rung below AI Pro (400 GB storage, 2× free-tier limits, Pro-model access) rather than only pushing free users straight to the $19.99 Pro plan, Google captures willingness-to-pay it was previously leaving on the table — the price-sensitive user who wants Pro-model access without Pro’s 5 TB and Workspace bundle. That the tier surfaced first in India at ₹399/mo (USD still unverified) is itself a signal: the lower rung targets high-volume, price-elastic markets where a Pro-priced jump suppresses conversion. It mirrors the same free-to-paid ramp logic Google already runs on the API side — meet the user at a lower entry point, then expand — applied to the consumer surface.

Areas to improve : gaps and friction in Google Gemini’s pricing approach

1. The 200K all-or-nothing long-context pricing cliff creates unpredictable cost spikes

When a request exceeds 200K input tokens, Google charges all tokens at the higher long-context rate — not just the overflow. This means a document that is 201K tokens costs materially more than a 199K-token document, even though only 1,000 extra tokens crossed the boundary. For developers working with variable-length documents (legal contracts, scientific papers, code repositories), this creates billing unpredictability that is difficult to model upfront. The fair fix is linear scaling: charge the >200K rate only on tokens that actually exceed the threshold. Until then, developers must either chunk aggressively or accept unpredictable bills — a real cost unpredictability problem that affects production planning.

2. Grounding with Google Search is easy to underestimate in cost models

The Grounding with Google Search feature adds $35 per 1,000 queries (for Gemini 2.5/2.0 models beyond the free daily limit) on top of token costs. For an application making 1 million search-grounded queries per month, grounding adds $35,000 — potentially exceeding the base token cost entirely. This additive cost is not prominently surfaced in the standard pricing documentation and is frequently missed by developers building RAG or agentic applications that rely on fresh web data. Google should integrate grounding costs directly into the pricing calculator and show estimated combined costs (tokens + grounding) in the AI Studio billing dashboard.

3. No published volume discounts create negotiation opacity for mid-market teams

Unlike Anthropic (which publishes committed-use discounts) or AWS (with Reserved Instances), Google’s Gemini API pricing has no published volume discount schedule. Teams spending $5,000–$50,000/month must negotiate privately with Google Cloud sales to receive any commitment discounts — and those discounts are not self-serve. This creates opacity for mid-market engineering teams who want price predictability but don’t have the leverage of enterprise contracts. Introducing a published tiered discount schedule (e.g., 10% off at $5K/month, 20% at $20K/month) would align with usage-based billing best practices and reduce churn to competitors who offer more transparent volume economics.

Monetization stack & signals : how Google builds & buys its revenue engine

Buys 0 Builds 3

The read — where the monetization investment is going

A hyperscaler meters its own AI: Gemini API and Vertex AI consumption are metered and billed entirely through Google Cloud Billing — the in-house engine that already invoices every other GCP service. The consumer Gemini app monetizes via Google One subscriptions, where third-party channels (Pixel Pass, Apple billing) are treated as the exception. No third-party billing/metering vendor sits in the developer path.

Stack — build vs buy

Builds in-house · 3

Google Cloud Billing Billing Docs Jun 2026

“The Gemini API uses Cloud Billing accounts for billing services, which you can set up directly in AI Studio.”
Google Cloud Billing (Vertex AI consumption) Metering Docs Jun 2026

“Gemini 2.0 is billed based on tokens, and you can calculate the number of input tokens in your request prior to sending the request using the SDK tokenizer or the countTokens API.”
Google One Billing inferred Docs Jun 2026

“Users not eligible to purchase AI credits include those subscribed through a third party, Pixel Pass, or Apple billing.”

Signals reviewed Jun 2026 · derived from product docs

Key takeaways

Context caching is Gemini’s most underutilized cost-reduction lever. The 90% discount on cached input tokens is the most impactful pricing feature in the Gemini API, but adoption is underestimated because it requires architectural deliberateness — you must explicitly design your application to cache and reuse context. Teams that invest in this pattern see 40–70% cost reductions on long-context workloads.
The 200K token cliff creates a non-linear billing risk that demands defensive engineering. Because all tokens are charged at long-context rates when input exceeds 200K, developers should implement hard context-length guards in their application code. This is a billing behavior unique to Google’s implementation and should be a standard check in any Gemini integration review.
The free-to-paid transition is a silent inflection point. AI Studio’s free tier is generous enough that many prototypes run entirely for free — until traffic grows and rate limits kick in. Teams should plan the Vertex AI migration in advance rather than reactively, because the jump from free-tier to production billing requires GCP account setup, quota requests, and billing configuration that can add days of friction.
Grounding with Google Search is a separate revenue stream that must be explicitly budgeted. Any application design that includes web-grounded answers must model search query costs separately from token costs. At $35/1K queries, high-traffic grounded applications can see search costs exceed token costs. Build grounding budgets as a dedicated line item, not an afterthought.
Google’s pricing power comes from platform integration, not from the API price itself. The real lock-in for Google Gemini is not the token price but the integration depth: Google Search grounding, Google Maps data, Vertex AI infrastructure, Workspace embedding, and GCP billing consolidation. A team fully integrated into this stack faces significant switching costs even if a competitor offers a lower per-token rate — a lesson for outcome-based AI pricing that platform value outlasts point pricing.

UBP implications

The multi-dimensional billing model (tokens + modality + grounding + throughput tier) represents the mature form of AI API pricing. Google’s pricing architecture is one of the most sophisticated in the market: token costs vary by input type (text, audio, image, video), context length, caching, throughput tier, and geographic endpoint. This complexity is the natural evolution of usage-based aggregation as AI products add more billable dimensions. Teams building AI-powered SaaS should study this model as a template for multi-dimensional usage billing.
Free tiers are infrastructure, not charity — they drive platform adoption at zero marginal cost. Google’s AI Studio free tier is a deliberate customer acquisition investment: the marginal cost of serving a developer on the free tier (rate-limited to 15 RPM) is negligible, while the lifetime value of a developer who brings Gemini into their organization is significant. For SaaS companies building usage-based products, this is the PLG playbook applied to API monetization — give away the trial unit economics to capture the compounding adoption.
Rate-multiplier pricing lets Google monetize new dimensions of value without leaving usage-based rails. Standard API pricing assumes all queries are equal; Google instead expresses added value as explicit multipliers on the same token meter buyers already forecast. Throughput is the first axis — some queries are worth paying 80% more for (Priority, 1.8×) and others can be deferred for 50% savings (Flex, 0.5×). The AlphaEvolve SKU (July 2026) extends the same logic up the stack to agentic work: a 2× agent surcharge on top of standard tokens (3× all-in) prices the extra value of agent orchestration — planning, iterating, self-evaluating — rather than inventing a per-task, per-outcome, or per-seat agent fee. For companies pricing their own AI-powered features, this is a template for both segmentation (latency-sensitive user-facing paths at a premium, batch work at a discount) and agentic expansion revenue: layer a legible surcharge on the underlying usage metric the feature consumes, so customers can price a run by scaling their existing token math instead of confronting a new billing model.

Sources

Gemini Developer API pricing — ai.google.dev (accessed 2026-05-29)
Gemini API pricing (developer docs) — ai.google.dev/gemini-api/docs/pricing (accessed 2026-07-14)
Vertex AI / Agent Platform Generative AI Pricing — Google Cloud (accessed 2026-07-14)
Google AI subscriptions (I/O 2026 update) — Google blog (accessed 2026-07-14)
Google AI plans — Google One (accessed 2026-05-30)
The Google AI Ultra plan now starts at $100 a month — Engadget (accessed 2026-05-30)
Gemini model family overview — Google DeepMind (accessed 2026-05-29)

Bottom line

Google Gemini offers the most technically sophisticated AI API pricing in the market: 90% context caching discounts, explicit Priority/Flex throughput tiers, and a free-with-no-credit-card AI Studio tier that removes adoption barriers globally. The pricing architecture rewards developers who invest in understanding it — the difference between a naïve and an optimized Gemini deployment can be a 50–70% cost reduction. But the complexity is also the risk: the 200K token pricing cliff, additive grounding costs, and no published volume discounts create budget unpredictability for teams that don’t model all dimensions upfront. Google’s real strategic bet is not winning on per-token price — it’s making the Gemini API indispensable through deep Google Cloud and Workspace integration, where token pricing is just the entry fee to a much stickier platform.

Browse the full pricing blueprint to compare Google Gemini against OpenAI, Anthropic, and other AI infrastructure providers.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

AlphaEvolve agent added to Vertex AI pricing

Jul 2026

Google added AlphaEvolve, DeepMind's evolutionary coding/optimization agent, as a separately-billed agentic SKU on the Vertex AI pricing page. Its cost is the underlying Gemini model rate plus an AlphaEvolve agent surcharge (agent = 2× base model, so the all-in total is 3× standard): e.g. Gemini 3.1 Pro Preview totals $6/1M input and $36/1M output; Gemini 3.5 Flash totals $4.50/1M input and $27/1M output. Headline Gemini API and Vertex token prices were verified unchanged vs 2026-07-06; the consumer AI Plus/Pro/Ultra lineup was also unchanged.

AlphaEvolve agent added to Vertex AI pricing screenshot 1

AlphaEvolve agent added to Vertex AI pricing screenshot 2

Google AI Plus consumer tier added; 2.0 Flash line retired

Jul 2026

A new Google AI Plus consumer tier appeared below AI Pro (400 GB storage, 2× free-tier usage limits, Pro-model access; captured only in INR at ₹399/mo, USD unverified) — extending the consumer paid ladder downward to a cheaper entry point. Same run: Gemini 2.0 Flash and 2.0 Flash-Lite reached end-of-life (shut down June 1, 2026), retiring the legacy entry-level API models; the production floor is now Gemini 2.5 Flash-Lite ($0.10/$0.40 per 1M). Imagen 4, Veo 3 and Veo 2 newly flagged deprecated with 2026 shut-down dates. Headline API/Vertex token prices verified unchanged vs 2026-05-29.

Google AI Plus consumer tier added; 2.0 Flash line retired screenshot 1

Google AI Plus consumer tier added; 2.0 Flash line retired screenshot 2

Gemini 3 Model Family — Regional Pricing

Apr 2026

Gemini 3 model family launched: Gemini 3.1 Pro Preview ($2/$12 per 1M), Gemini 3.5 Flash ($1.50/$9.00 per 1M), Gemini 3 Flash Preview ($0.50/$3.00 per 1M), and Gemini 3.1 Flash-Lite ($0.25/$1.50 per 1M). Regional (non-global) pricing introduced with 10% premium effective July 2026.

Gemini 3 Model Family — Regional Pricing screenshot 1

Gemini 3 Model Family — Regional Pricing screenshot 2

Gemini 3 Model Family — Regional Pricing screenshot 3

Gemini 3 Model Family — Regional Pricing screenshot 4

Gemini 3 Model Family — Regional Pricing screenshot 5

Gemini 2.5 Pro and Flash at Google I/O 2025

May 2025

Gemini 2.5 Pro and 2.5 Flash launched at Google I/O 2025. 2.5 Pro priced at $1.25/$10 per 1M input/output (≤200K context). 2.5 Flash at $0.30/$2.50 per 1M. Both offered with Priority (1.8×) and Flex/Batch (0.5×) tiers on Vertex AI.

Gemini 2.0 Flash and Flash Lite Released

Jan 2025

Gemini 2.0 Flash launched at $0.15/1M input and $0.60/1M output tokens — matching Gemini 1.5 Flash performance at similar price with added multimodal capabilities including native audio and image output. 2.0 Flash Lite added at $0.075/$0.30 as entry-level option.

Gemini 1.5 Pricing Cut ~50–65%

Nov 2024

Gemini 1.5 Pro pricing reduced significantly. Input prices fell from $3.50 to $1.25/1M tokens (short context), output from $10.50 to $5/1M. Gemini 1.5 Flash dropped from $0.075 to $0.0375/1M input tokens. This ~50–65% cut positioned Gemini aggressively vs OpenAI GPT-4o and Anthropic Claude 3.5.

Gemini 1.5 Pro and Flash — 1M Context Windows

May 2024

Google I/O 2024: Gemini 1.5 Pro and Gemini 1.5 Flash announced with 1M token context windows. Gemini 1.5 Flash introduced as a speed-optimized, cost-efficient model at $0.35/1M input (short context). Both models offered via Gemini API with a free tier on AI Studio.

Gemini Advanced via Google One AI Premium

Feb 2024

Gemini 1.0 Ultra launched as 'Gemini Advanced' inside the Google One AI Premium subscription at $19.99/month, bundled with 2TB storage. No standalone Ultra API made available.

Google Gemini Family Launched

Dec 2023

Google launched Gemini, its multimodal AI family (Ultra, Pro, Nano). The Gemini Pro API became available to developers free via Google AI Studio. This marked the rebrand of Bard's underlying model to Gemini.

Trivia

· Google Gemini 2.5 Flash-Lite outputs tokens at just $0.40/1M — cheaper per output token than any other frontier-grade model from OpenAI or Anthropic as of mid-2026, enabling cost-effective large-scale deployments.
· Google's context caching gives a 90% discount on cached input tokens, meaning a developer who sends the same 100K-token system prompt 1,000 times per day saves roughly $11,250/month compared to charging all tokens at standard rate.
· The Gemini API free tier via AI Studio requires no credit card, making it one of the most accessible no-commitment AI API tiers in the market — ideal for student developers, hobbyists, and prototype builders worldwide.

Questions & answers

How much does the Gemini API cost per million tokens?: Prices range from $0.075/1M input tokens (Gemini 2.0 Flash Lite) to $2/1M input tokens (Gemini 3.1 Pro Preview). Output tokens are typically 4–6× input prices. Gemini 2.5 Pro costs $1.25 input / $10 output per 1M tokens (standard tier, ≤200K context).
Does Google Gemini have a free tier for API access?: Yes. AI Studio offers a free tier with rate-limited access to Gemini 2.0 Flash and 2.5 Flash — no credit card required. Free limits are approximately 15 RPM and 1M TPM for Flash models. This free tier is not available on Vertex AI.
What is the difference between Gemini API on AI Studio vs Vertex AI?: AI Studio provides a simpler developer onboarding with a free tier and lower operational overhead. Vertex AI adds enterprise features: VPC Service Controls, dedicated quotas, SLA guarantees, Priority and Flex/Batch tiers, fine-tuning, and full Google Cloud billing integration. Pricing is the same at standard rates.
What is the Gemini context caching discount?: Cached input tokens cost 10% of the standard input rate — a 90% discount. For example, Gemini 2.5 Pro cached input costs $0.125/1M tokens vs $1.25/1M for standard input. This significantly reduces costs for applications that reuse long system prompts or documents across many queries.
What is the Flex/Batch API and how much does it save?: The Flex/Batch API allows asynchronous, non-time-critical inference at 50% of the standard price. For example, Gemini 2.5 Pro drops from $1.25/$10 to $0.625/$5 per 1M input/output tokens. Ideal for data processing pipelines and offline workloads that can tolerate latency.
What does Google AI Pro include and cost?: Google AI Pro (renamed from Google One AI Premium at I/O 2025) costs $19.99/month and includes the consumer Gemini app with higher usage limits, 5TB of Google Drive/Gmail/Photos storage, and Gemini in Google Workspace apps. As of July 2026 a cheaper Google AI Plus tier sits below it (400GB storage, 2× free-tier limits and Pro-model access; priced at ₹399/month in India, with the USD price not yet verified). Google AI Ultra starts at $100/month (with a $200/month premium tier, reduced from $249.99 at I/O 2026) and adds 20TB storage and higher limits. All are consumer subscriptions unrelated to the developer API pricing.