AI Summary
About
Google Gemini is Google’s flagship generative AI model family, developed by Google DeepMind and accessible to developers through the Gemini API (via AI Studio) and Google Cloud Vertex AI. Launched in December 2023 as a replacement for the PaLM 2 and Bard-era models, Gemini is Google’s direct response to OpenAI’s GPT-4 and Anthropic’s Claude — a multimodal model family capable of processing text, images, audio, video, and code in a single unified architecture.
The Gemini family spans multiple capability tiers: Lite models for cost-efficient inference, Flash models for speed-and-cost balance, and Pro models for maximum quality. As of mid-2026, the Gemini 3 series (3.1 Pro, 3.5 Flash, 3 Flash, 3.1 Flash-Lite) represents the current frontier, alongside the Gemini 2.5 and 2.0 families which remain in production use.
Google’s AI revenue is integrated into Google Cloud, which reported $43B in revenue in 2025 (up 28% YoY), with Gemini API and Vertex AI contributing meaningfully to that growth. The consumer Gemini app (formerly Bard) is monetized through the Google One AI Premium subscription at $19.99/month, bundled with Google Workspace AI features.
Google’s market position is unique: it controls the entire stack from TPU hardware to foundation models to the consumer app surface, and leverages Search and Workspace distribution that no competitor can match. This vertical integration creates a pricing dynamic where developer API pricing is often set below cost to drive platform adoption, while the real monetization occurs through enterprise Google Cloud and Workspace relationships. Compare this approach to Perplexity AI’s freemium + search API model for a contrasting pure-play AI monetization strategy.
Pricing summary : pure pay-per-token across a three-tier speed/quality ladder
Google Gemini’s developer pricing is pure usage-based with no subscription or seat fee — you pay only for tokens consumed, billed through Google Cloud. The pricing architecture has three access modes: a free tier on AI Studio (rate-limited, no credit card), standard pay-as-you-go on Vertex AI, and enterprise-grade access with Priority throughput guarantees or Flex/Batch discounts.
The model portfolio forms a deliberate cost ladder. Gemini 3.1 Flash-Lite at $0.25/$1.50 per 1M input/output tokens is Google’s cheapest production model — suited for classification, summarization, and routing tasks. Gemini 2.5 Flash at $0.30/$2.50 offers a better quality/cost balance for general reasoning. Gemini 2.5 Pro at $1.25/$10 is the power tier for complex tasks. Gemini 3.1 Pro Preview at $2/$12 is Google’s current frontier, still in preview as of May 2026.
What makes this different: Google’s context caching mechanic is structurally different from competitors: cached input costs just 10% of the standard rate, creating a 90% discount on repeated context. For applications with long, reused system prompts or documents, this single feature can reduce per-query costs by 40–70% at scale. No other major AI provider offers a comparable caching discount depth. This aligns with the principles in understanding usage-based pricing models — pricing dimensions that map directly to underlying resource consumption.
Pricing by product
Gemini API: Current model pricing (Standard tier, pay-as-you-go)
| Model | Input ≤200K (per 1M) | Input >200K (per 1M) | Cached Input (per 1M) | Output (per 1M) | Best for |
|---|---|---|---|---|---|
| Gemini 3.1 Pro Preview | $2.00 | $4.00 | $0.20 | $12.00 | Frontier quality, complex reasoning |
| Gemini 3.5 Flash | $1.50 | $1.50 | $0.15 | $9.00 | High-quality speed-optimized |
| Gemini 3 Flash Preview | $0.50 | $0.50 | $0.05 | $3.00 | Balanced cost/quality |
| Gemini 3.1 Flash-Lite | $0.25 | $0.25 | $0.025 | $1.50 | High-volume, cost-first |
| Gemini 2.5 Pro | $1.25 | $2.50 | $0.13 | $10.00 | Best current production quality |
| Gemini 2.5 Flash | $0.30 | $0.30 | $0.03 | $2.50 | Developer default (price/perf) |
| Gemini 2.5 Flash-Lite | $0.10 | $0.10 | $0.01 | $0.40 | Lowest-cost production tier |
| Gemini 2.0 Flash | $0.10 | $0.10 | $0.025 | $0.40 | Stable production (deprecated, shuts down June 2026) |
| Gemini 2.0 Flash-Lite | $0.075 | $0.075 | — | $0.30 | Entry-level production |
Vertex AI pricing tiers (multipliers applied to standard rate)
| Tier | Multiplier | Use case | Availability |
|---|---|---|---|
| Standard | 1.0× | Default pay-as-you-go | All models |
| Priority | 1.8× | Guaranteed capacity, low latency SLA | Gemini 2.5, 3.x |
| Flex/Batch | 0.5× | Async / offline workloads | Gemini 2.0, 2.5, 3.x |
Grounding and multimodal add-ons
| Feature | Free allowance | Paid rate |
|---|---|---|
| Grounding with Google Search (Gemini 2.5/2.0 Flash) | 1,500 queries/day | $35/1K queries |
| Grounding with Google Search (Gemini 2.5/2.0 Pro) | 10,000 queries/day | $35/1K queries |
| Grounding with Google Search (Gemini 3.x) | 5,000 queries/month | $14/1K queries |
| Web Grounding for Enterprise | None listed | $45/1K prompts |
| Grounding with Google Maps | 5,000 queries/month | $14–$25/1K queries |
| Grounding with Your Data | None listed | $2.50/1K prompts |
Consumer access (non-developer)
| Product | Price | Includes |
|---|---|---|
| Gemini App (basic) | Free | Latest Gemini Flash model in consumer chat; limited to web/mobile app |
| Google AI Pro | $19.99/month | Consumer Gemini app with higher limits, 5TB storage, Gemini in Workspace, YouTube Premium Lite (renamed from Google One AI Premium) |
| Google AI Ultra (entry) | $100/month | 5× higher Gemini/Antigravity limits vs Pro, 20TB storage, YouTube Premium, Deep Think (new tier introduced at I/O 2026) |
| Google AI Ultra (premium) | $200/month | 20× higher Gemini/Antigravity limits vs Pro, Project Genie access (reduced from $249.99 at I/O 2026) |
Sales motions across products: Pure self-serve PLG for AI Studio and Gemini API (pay-as-you-go); sales-led for Vertex AI enterprise commitments, dedicated capacity, and Google Workspace enterprise contracts. Consumer Google One AI Premium is self-serve subscription. All prices accessed 2026-05-29.
Hidden costs : what surprises Gemini API buyers beyond base token rates
Archetype A: Developer building a RAG chatbot on Gemini 2.5 Pro
A team building a document Q&A application, processing 1M queries/month with a 50K-token system prompt sent fresh each query:
| Line item | Monthly cost |
|---|---|
| Gemini 2.5 Pro — input tokens (1M queries × 50K tokens each) | $62,500 |
| Gemini 2.5 Pro — output tokens (avg 500 tokens per query) | $5,000 |
| Grounding with Google Search (1M queries, paid tier) | $35,000 |
| Estimated total (no caching) | ~$102,500 |
With context caching enabled (same 50K prompt, cached after first call each session):
| Line item | Monthly cost |
|---|---|
| Gemini 2.5 Pro — cached input (50K × 1M at $0.13/1M) | $6,500 |
| Gemini 2.5 Pro — output tokens | $5,000 |
| Grounding with Google Search | $35,000 |
| Estimated total (with caching) | ~$46,500 |
Context caching alone saves ~54% — but grounding costs are additive and can dominate the bill. The Google Gemini pricing calculator can model your specific token volumes and caching ratios.
Archetype B: Startup on AI Studio free tier going to production
| Hidden cost | Impact |
|---|---|
| Free tier rate limits (~15 RPM) | Forces a move to paid Vertex AI when traffic grows — costs jump from $0 to metered instantly |
| Long-context surcharge (>200K tokens) | If any input exceeds 200K, all tokens in that request are charged at the higher rate — not just the overflow |
| Grounding with Google Search | Not included in base token pricing; 1,500–10,000 free queries/day then billed separately |
| Non-global endpoint premium (July 2026) | Gemini 3 models accessed from regional endpoints carry a 10% surcharge vs global endpoints |
| Image token calculation complexity | Image input tokens vary by resolution (560–2,000+ tokens per image); audio at separate per-token rates |
Use the Google Gemini pricing calculator to model your expected monthly spend before moving a prototype into production — the gap between free tier and first paid bill can be significant.
Pricing evolution : how Gemini pricing has changed since launch
Cadence
| Quarter | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2023 Q4 | 0 | 1 | Gemini Pro API launched free; Gemini 1.0 Ultra in Bard |
| 2024 Q1 | 0 | 1 | Google One AI Premium ($19.99/mo) launched with Gemini Advanced |
| 2024 Q2 | 0 | 2 | Gemini 1.5 Pro and 1.5 Flash launched at Google I/O; 1M token context |
| 2024 Q4 | 2 | 0 | Gemini 1.5 Pro price cut ~65% ($3.50→$1.25 input); 1.5 Flash cut ~50% |
| 2025 Q1 | 0 | 2 | Gemini 2.0 Flash ($0.15/$0.60) and Flash-Lite ($0.075/$0.30) launched |
| 2025 Q2 | 0 | 2 | Gemini 2.5 Pro and 2.5 Flash launched at Google I/O 2025; Priority/Batch tiers |
| 2026 Q2 | 0 | 4 | Gemini 3 family launched (3.1 Pro, 3.5 Flash, 3 Flash, 3.1 Flash-Lite); regional pricing introduced |
Tracked range: 2023 Q4–2026 Q2. Quarters not listed above were verified stable with no price changes.
Notable changes
- 2023-12-13 — Gemini 1.0 Pro API launched free via Google AI Studio; Gemini Ultra available only through Google One AI Premium ($19.99/mo). No pay-as-you-go API at launch.
- 2024-02-15 — Google One AI Premium launched at $19.99/month with Gemini Advanced (Ultra-class model), marking Google’s first consumer AI subscription and the first Gemini monetization event.
- 2024-05-14 — Google I/O: Gemini 1.5 Pro ($3.50/$10.50 per 1M input/output) and Gemini 1.5 Flash ($0.35/$1.05 per 1M) launched with 1M-token context windows. Both available via Gemini API with free tier on AI Studio.
- 2024-11-19 — Aggressive price cuts: Gemini 1.5 Pro input dropped from $3.50 to $1.25/1M (64% cut); output from $10.50 to $5/1M (52% cut). Gemini 1.5 Flash input from $0.075 to $0.0375/1M (50% cut). This directly responded to GPT-4o and Claude 3.5 Sonnet competitive pressure and aligned with broader AI token cost deflation trends across the market.
- 2025-01-15 — Gemini 2.0 Flash launched at $0.15/$0.60 per 1M input/output with native multimodal capabilities (audio output, image generation). Flash-Lite at $0.075/$0.30 as the entry-level option. Batch API discount introduced at 50%.
- 2025-05-20 — Gemini 2.5 Pro ($1.25/$10) and 2.5 Flash ($0.30/$2.50) launched with improved reasoning and Priority (1.8×) and Flex/Batch (0.5×) pricing tiers formally introduced as named SKUs on Vertex AI.
- 2026-04-01 — Gemini 3 model family launched with image-native generation models (Gemini 3 Pro Image, 3.1 Flash Image). Regional endpoint pricing announced effective July 1, 2026 — first geographic pricing split for Gemini.
What’s unique : differentiators in Gemini’s pricing mechanics
1. 90% context caching discount — the deepest in the market. Google’s context caching gives a 90% discount on cached input tokens (10% of standard rate). Competitors like Anthropic offer prompt caching at ~90% discount too, but Google’s implementation is available across more model tiers and integrates directly with the 200K long-context boundary. For a real-world developer sending a 100K-token knowledge base with every query, caching turns Gemini 2.5 Pro from a $1.25/1M cost to $0.13/1M on those tokens — a difference that materially changes unit economics for AI agent workflows and RAG architectures.
2. Three throughput tiers priced explicitly: Standard / Priority / Flex. Google is unique in pricing throughput guarantees as explicit rate multipliers (1.8× for Priority, 0.5× for Flex/Batch). Most competitors hide throughput guarantees inside enterprise contracts or tier-based rate limits. Making the throughput economics visible and self-serve allows engineering teams to make price-vs-latency tradeoffs without a sales call — a practice consistent with choosing the right usage metric for production AI infrastructure.
3. Long-context all-or-nothing pricing cliff at 200K tokens. When a request exceeds 200K input tokens, Google charges all tokens (both input and output) at the higher long-context rate — not just the overflow. This is a non-obvious pricing mechanic: a 201K-token request costs materially more than a 199K-token request even though only 1,000 tokens crossed the threshold. This creates a strong incentive to chunk or truncate context at the boundary, and represents a meaningful hidden cost for unprepared teams.
4. Free tier via AI Studio with no credit card. Google provides free access to Gemini 2.0 Flash and 2.5 Flash through AI Studio with no payment method required — a developer acquisition strategy that no major competitor fully matches at the same breadth. This free-to-paid pathway is a key part of how AI companies shift from per-user to usage models, using free utility to create developer stickiness before introducing paid tiers.
5. Grounding with Google Search as a separately billed capability. Unlike most LLM APIs where web retrieval is either included or not supported, Google offers Grounding with Google Search as an additive per-query charge. This is the only AI API on the market where you can buy “real-time internet access” on a usage-based basis. The business logic is clear: Google Search is a premium monetizable asset, and attaching it to AI queries creates a secondary revenue stream alongside token costs.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| 90% context caching discount dramatically reduces costs for long-context repeated queries | Long-context surcharge (>200K) applies to ALL tokens, not just overflow — a non-intuitive pricing cliff |
| Free tier via AI Studio with no credit card removes adoption friction for developers globally | No volume discounts published; enterprise pricing requires Google Cloud contracts, not self-serve |
| Explicit Priority (1.8×) and Flex/Batch (0.5×) tiers let teams make throughput tradeoffs without sales calls | Grounding with Google Search is separately billed — easy to miss in initial cost estimates |
| Broadest model portfolio: 8+ Gemini models plus Gemma, covering every cost/quality tradeoff point | Regional endpoint pricing (July 2026) adds 10% to Gemini 3 costs outside global endpoints — a surprise for non-global deployments |
| Native multimodal: text, image, audio, video in one API with unified token billing | Image token calculation varies by resolution (560–2,000+ tokens per image) — hard to predict costs without the countTokens API |
| Batch API at 50% discount is broadly available and well-documented | AI Studio free tier data may be used for model improvement — a compliance concern for sensitive data use cases |
Billing UX : developer experience with Gemini API billing controls
- Billing via Google Cloud — All Gemini API paid usage is billed through a Google Cloud project. Developers need a GCP account with billing enabled; there is no standalone Gemini API billing portal separate from Cloud.
- AI Studio free tier — Available without a credit card or GCP billing account. Rate limits (approximately 15 RPM for Flash models) apply; exceeding them returns 429 errors rather than billing overages.
- Usage dashboard — Google Cloud Console provides token usage, request counts, and cost breakdown by model and project. Granularity is at the hourly level; real-time spend is visible with slight delay.
- Budget alerts — GCP budget alerts allow teams to set monthly spend thresholds with email/Pub-Sub notifications. Spend caps are advisory (not hard limits by default); hard spend caps require custom quota limits set separately.
- Quota management — Default quotas are set per-project per-model. Increasing quotas requires a quota request in the GCP Console — self-serve up to a threshold, then Google Cloud support review for higher limits.
- Batch API billing — Batch requests are queued and billed at 50% of standard rates upon completion. No real-time billing; costs appear after batch job finishes.
- Context caching billing — Cached context is billed at creation time (full token cost), then subsequent cache hits at 10% of normal input rate. Cache TTL is configurable (default varies by model). Stale caches that expire before sufficient queries arrive do not pay back their creation cost — a minor but real UX friction.
- Vertex AI commitment discounts — Committed Use Discounts (CUDs) are available on Vertex AI for predictable workloads but require annual commitment contracts and are negotiated through Google Cloud sales rather than self-serve.
- Payment methods — Credit card, bank account, invoicing (enterprise). Enterprise Google Cloud customers may use consolidated invoicing across all Cloud services.
Strategic wins : where Google Gemini’s pricing decisions have excelled
1. The November 2024 price cuts reframed Gemini’s competitive position
When Google cut Gemini 1.5 Pro pricing by 64% in November 2024, it was not just a defensive response to GPT-4o — it was a market-defining signal. At $1.25/1M input tokens, Gemini 1.5 Pro undercut OpenAI’s GPT-4o ($2.50/1M input) by 50% while offering comparable quality and a larger context window. The cut triggered a pricing reset across the AI market, with Anthropic and OpenAI both reducing prices within months. Google’s willingness to sacrifice margin on API pricing reflects its core strategic bet: make the Gemini ecosystem so cheap that every developer defaults to Gemini, generating long-term lock-in via Google Cloud and Workspace. See AI companies shifting from per-user licenses for why platform adoption is the real prize.
2. The 90% caching discount creates a usage-based loyalty mechanism
Context caching is both a genuine technical feature and a brilliant pricing strategy. Developers who architect applications around Gemini’s caching system — sending long documents, system prompts, or few-shot examples once and reusing them across queries — see dramatic cost reductions that are specific to Gemini’s implementation. This creates switching costs that are invisible in a benchmark comparison: a production application optimized for Gemini caching would need re-engineering to migrate to an API without equivalent discounts. It is a usage-based pricing mechanic that creates loyalty without any contractual lock-in.
3. AI Studio’s no-credit-card free tier dominates developer acquisition
By offering a genuinely useful free tier with no payment method required, Google captures developer attention at the earliest possible moment in the decision journey. A student in India, a startup founder in Nigeria, or a researcher at a university can start building with Gemini 2.5 Flash today without a GCP account or credit card. This zero-friction acquisition strategy has made Gemini the default experimentation platform for a generation of developers who later bring it into their organizations. The PLG model applied to API products works when the free tier is genuinely useful — and Gemini 2.5 Flash free is genuinely useful.
4. Explicit throughput pricing (Priority / Flex) removes enterprise friction
By publishing Priority (1.8×) and Flex/Batch (0.5×) pricing as explicit, self-serve options rather than burying them in enterprise contracts, Google lets engineering teams make real-time economic decisions about throughput tradeoffs. A team running nightly data enrichment can switch to Flex/Batch and cut costs 50% with a single API parameter change — no procurement, no negotiation. This designing value-based pricing approach creates a natural upsell: teams that adopt Flex/Batch for batch workloads naturally consider Priority for latency-critical paths, expanding total spend with Google.
Areas to improve : gaps and friction in Google Gemini’s pricing approach
1. The 200K all-or-nothing long-context pricing cliff creates unpredictable cost spikes
When a request exceeds 200K input tokens, Google charges all tokens at the higher long-context rate — not just the overflow. This means a document that is 201K tokens costs materially more than a 199K-token document, even though only 1,000 extra tokens crossed the boundary. For developers working with variable-length documents (legal contracts, scientific papers, code repositories), this creates billing unpredictability that is difficult to model upfront. The fair fix is linear scaling: charge the >200K rate only on tokens that actually exceed the threshold. Until then, developers must either chunk aggressively or accept unpredictable bills — a real cost unpredictability problem that affects production planning.
2. Grounding with Google Search is easy to underestimate in cost models
The Grounding with Google Search feature adds $35 per 1,000 queries (for Gemini 2.5/2.0 models beyond the free daily limit) on top of token costs. For an application making 1 million search-grounded queries per month, grounding adds $35,000 — potentially exceeding the base token cost entirely. This additive cost is not prominently surfaced in the standard pricing documentation and is frequently missed by developers building RAG or agentic applications that rely on fresh web data. Google should integrate grounding costs directly into the pricing calculator and show estimated combined costs (tokens + grounding) in the AI Studio billing dashboard.
3. No published volume discounts create negotiation opacity for mid-market teams
Unlike Anthropic (which publishes committed-use discounts) or AWS (with Reserved Instances), Google’s Gemini API pricing has no published volume discount schedule. Teams spending $5,000–$50,000/month must negotiate privately with Google Cloud sales to receive any commitment discounts — and those discounts are not self-serve. This creates opacity for mid-market engineering teams who want price predictability but don’t have the leverage of enterprise contracts. Introducing a published tiered discount schedule (e.g., 10% off at $5K/month, 20% at $20K/month) would align with usage-based billing best practices and reduce churn to competitors who offer more transparent volume economics.
Key takeaways
-
Context caching is Gemini’s most underutilized cost-reduction lever. The 90% discount on cached input tokens is the most impactful pricing feature in the Gemini API, but adoption is underestimated because it requires architectural deliberateness — you must explicitly design your application to cache and reuse context. Teams that invest in this pattern see 40–70% cost reductions on long-context workloads.
-
The 200K token cliff creates a non-linear billing risk that demands defensive engineering. Because all tokens are charged at long-context rates when input exceeds 200K, developers should implement hard context-length guards in their application code. This is a billing behavior unique to Google’s implementation and should be a standard check in any Gemini integration review.
-
The free-to-paid transition is a silent inflection point. AI Studio’s free tier is generous enough that many prototypes run entirely for free — until traffic grows and rate limits kick in. Teams should plan the Vertex AI migration in advance rather than reactively, because the jump from free-tier to production billing requires GCP account setup, quota requests, and billing configuration that can add days of friction.
-
Grounding with Google Search is a separate revenue stream that must be explicitly budgeted. Any application design that includes web-grounded answers must model search query costs separately from token costs. At $35/1K queries, high-traffic grounded applications can see search costs exceed token costs. Build grounding budgets as a dedicated line item, not an afterthought.
-
Google’s pricing power comes from platform integration, not from the API price itself. The real lock-in for Google Gemini is not the token price but the integration depth: Google Search grounding, Google Maps data, Vertex AI infrastructure, Workspace embedding, and GCP billing consolidation. A team fully integrated into this stack faces significant switching costs even if a competitor offers a lower per-token rate — a lesson for outcome-based AI pricing that platform value outlasts point pricing.
UBP implications
-
The multi-dimensional billing model (tokens + modality + grounding + throughput tier) represents the mature form of AI API pricing. Google’s pricing architecture is one of the most sophisticated in the market: token costs vary by input type (text, audio, image, video), context length, caching, throughput tier, and geographic endpoint. This complexity is the natural evolution of usage-based aggregation as AI products add more billable dimensions. Teams building AI-powered SaaS should study this model as a template for multi-dimensional usage billing.
-
Free tiers are infrastructure, not charity — they drive platform adoption at zero marginal cost. Google’s AI Studio free tier is a deliberate customer acquisition investment: the marginal cost of serving a developer on the free tier (rate-limited to 15 RPM) is negligible, while the lifetime value of a developer who brings Gemini into their organization is significant. For SaaS companies building usage-based products, this is the PLG playbook applied to API monetization — give away the trial unit economics to capture the compounding adoption.
-
Throughput pricing (Priority / Flex) reveals a latent dimension of AI API value. Standard API pricing assumes all queries are equal. Google’s explicit throughput tiers expose that some queries are worth paying 80% more for (Priority) and others can be deferred for 50% savings (Flex). For companies pricing their own AI-powered features, this suggests a natural segmentation: latency-sensitive user-facing features at a premium, background enrichment and batch operations at discount — a usage metric selection that captures value variation across the customer journey.
Sources
- Gemini Developer API pricing — ai.google.dev (accessed 2026-05-29)
- Vertex AI / Agent Platform Generative AI Pricing — Google Cloud (accessed 2026-05-29)
- Google AI subscriptions (I/O 2026 update) — Google blog (accessed 2026-05-30)
- Google AI plans — Google One (accessed 2026-05-30)
- The Google AI Ultra plan now starts at $100 a month — Engadget (accessed 2026-05-30)
- Gemini model family overview — Google DeepMind (accessed 2026-05-29)
Bottom line
Google Gemini offers the most technically sophisticated AI API pricing in the market: 90% context caching discounts, explicit Priority/Flex throughput tiers, and a free-with-no-credit-card AI Studio tier that removes adoption barriers globally. The pricing architecture rewards developers who invest in understanding it — the difference between a naïve and an optimized Gemini deployment can be a 50–70% cost reduction. But the complexity is also the risk: the 200K token pricing cliff, additive grounding costs, and no published volume discounts create budget unpredictability for teams that don’t model all dimensions upfront. Google’s real strategic bet is not winning on per-token price — it’s making the Gemini API indispensable through deep Google Cloud and Workspace integration, where token pricing is just the entry fee to a much stickier platform.
Browse the full pricing blueprint to compare Google Gemini against OpenAI, Anthropic, and other AI infrastructure providers.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
Gemini 3 Model Family — Regional Pricing
Gemini 3 model family launched: Gemini 3.1 Pro Preview ($2/$12 per 1M), Gemini 3.5 Flash ($1.50/$9.00 per 1M), Gemini 3 Flash Preview ($0.50/$3.00 per 1M), and Gemini 3.1 Flash-Lite ($0.25/$1.50 per 1M). Regional (non-global) pricing introduced with 10% premium effective July 2026.
Gemini 2.5 Pro and Flash at Google I/O 2025
Gemini 2.5 Pro and 2.5 Flash launched at Google I/O 2025. 2.5 Pro priced at $1.25/$10 per 1M input/output (≤200K context). 2.5 Flash at $0.30/$2.50 per 1M. Both offered with Priority (1.8×) and Flex/Batch (0.5×) tiers on Vertex AI.
Gemini 2.0 Flash and Flash Lite Released
Gemini 2.0 Flash launched at $0.15/1M input and $0.60/1M output tokens — matching Gemini 1.5 Flash performance at similar price with added multimodal capabilities including native audio and image output. 2.0 Flash Lite added at $0.075/$0.30 as entry-level option.
Gemini 1.5 Pricing Cut ~50–65%
Gemini 1.5 Pro pricing reduced significantly. Input prices fell from $3.50 to $1.25/1M tokens (short context), output from $10.50 to $5/1M. Gemini 1.5 Flash dropped from $0.075 to $0.0375/1M input tokens. This ~50–65% cut positioned Gemini aggressively vs OpenAI GPT-4o and Anthropic Claude 3.5.
Gemini 1.5 Pro and Flash — 1M Context Windows
Google I/O 2024: Gemini 1.5 Pro and Gemini 1.5 Flash announced with 1M token context windows. Gemini 1.5 Flash introduced as a speed-optimized, cost-efficient model at $0.35/1M input (short context). Both models offered via Gemini API with a free tier on AI Studio.
Gemini Advanced via Google One AI Premium
Gemini 1.0 Ultra launched as 'Gemini Advanced' inside the Google One AI Premium subscription at $19.99/month, bundled with 2TB storage. No standalone Ultra API made available.
Google Gemini Family Launched
Google launched Gemini, its multimodal AI family (Ultra, Pro, Nano). The Gemini Pro API became available to developers free via Google AI Studio. This marked the rebrand of Bard's underlying model to Gemini.
- · Google Gemini 2.5 Flash-Lite outputs tokens at just $0.40/1M — cheaper per output token than any other frontier-grade model from OpenAI or Anthropic as of mid-2026, enabling cost-effective large-scale deployments.
- · Google's context caching gives a 90% discount on cached input tokens, meaning a developer who sends the same 100K-token system prompt 1,000 times per day saves roughly $11,250/month compared to charging all tokens at standard rate.
- · The Gemini API free tier via AI Studio requires no credit card, making it one of the most accessible no-commitment AI API tiers in the market — ideal for student developers, hobbyists, and prototype builders worldwide.
Questions & answers
- How much does the Gemini API cost per million tokens?
- Prices range from $0.075/1M input tokens (Gemini 2.0 Flash Lite) to $2/1M input tokens (Gemini 3.1 Pro Preview). Output tokens are typically 4–6× input prices. Gemini 2.5 Pro costs $1.25 input / $10 output per 1M tokens (standard tier, ≤200K context).
- Does Google Gemini have a free tier for API access?
- Yes. AI Studio offers a free tier with rate-limited access to Gemini 2.0 Flash and 2.5 Flash — no credit card required. Free limits are approximately 15 RPM and 1M TPM for Flash models. This free tier is not available on Vertex AI.
- What is the difference between Gemini API on AI Studio vs Vertex AI?
- AI Studio provides a simpler developer onboarding with a free tier and lower operational overhead. Vertex AI adds enterprise features: VPC Service Controls, dedicated quotas, SLA guarantees, Priority and Flex/Batch tiers, fine-tuning, and full Google Cloud billing integration. Pricing is the same at standard rates.
- What is the Gemini context caching discount?
- Cached input tokens cost 10% of the standard input rate — a 90% discount. For example, Gemini 2.5 Pro cached input costs $0.13/1M tokens vs $1.25/1M for standard input. This significantly reduces costs for applications that reuse long system prompts or documents across many queries.
- What is the Flex/Batch API and how much does it save?
- The Flex/Batch API allows asynchronous, non-time-critical inference at 50% of the standard price. For example, Gemini 2.5 Pro drops from $1.25/$10 to $0.625/$5 per 1M input/output tokens. Ideal for data processing pipelines and offline workloads that can tolerate latency.
- What does Google AI Pro include and cost?
- Google AI Pro (renamed from Google One AI Premium at I/O 2025) costs $19.99/month and includes the consumer Gemini app with higher usage limits, 5TB of Google Drive/Gmail/Photos storage, and Gemini in Google Workspace apps. Google AI Ultra starts at $100/month (with a $200/month premium tier, reduced from $249.99 at I/O 2026) and adds 20TB storage and higher limits. Both are consumer subscriptions unrelated to the developer API pricing.