All companies
technology

Moonshot AI pricing

platform.moonshot.ai facts checked analysis reviewed
Estimate your Moonshot AI cost — model your usage, see overages, and find the cheapest plan. Open calculator →
Quick summary
Sales motion
Product segment
Region
Product
Kimi assistant + Kimi/Moonshot open-weight LLM API
Industry
technology
Commits
None
In this page
AI Summary
  • Moonshot AI prices two surfaces from its Kimi platform: a tempo-named consumer assistant ($0 Free to $199/mo Vivace) and a per-token developer API.
  • The legacy moonshot-v1 API is context-length-tiered: the same model costs $0.20/$2.00 per 1M at 8k context, $1.00/$3.00 at 32k, and $2.00/$5.00 at 128k (in/out).
  • Kimi K2 open-weight models drove an aggressive price war — K2.5 runs $0.60 in / $3.00 out per 1M, with cache-hit input as low as $0.10/M (roughly 83% off).
  • Moonshot pioneered context caching for LLM APIs in 2024; cache hits cut input cost ~80–85% across the K2 family, the signature cost lever on the price sheet.
  • Kimi K2.6 (April 2026) raised input 58% from $0.60 to $0.95/M and output to $4.00/M — a deliberate commercialization signal ahead of a reported Hong Kong IPO.
Pricing summary
Moonshot AI / Kimi 2026 — two priced surfaces: assistant subscriptions + per-token API
A tempo-named Kimi assistant ($0 to $199/mo) sits alongside a context-length-tiered, cache-discounted per-token API spanning open-weight Kimi K2 and legacy moonshot-v1 models.
Free (Adagio)
$0
Individuals using the Kimi assistant
Allegretto
$39 /mo
Heavy users needing Agent Swarm
Allegro / Vivace
$99–$199 /mo
Maximum quotas + heaviest agent/code use
API — Kimi K2 (open-weight)
from $0.10 /M tok
Developers calling open-weight Kimi K2 models
API — moonshot-v1 (context-tiered)
from $0.20 /M tok
Developers on the legacy context-tiered API
Kimi assistant prices are USD/mo (annual saves 20%). API prices are USD per 1M tokens; context caching cuts input ~80–85%. Full per-model table below. RMB ¥ rates apply on the China-native card.

About

Moonshot AI (月之暗面, “Dark Side of the Moon”) is a Beijing-based foundation-model company best known for Kimi, its long-context assistant, and the Kimi K2 family of open-weight models. It serves two distinct buyers: individuals and teams who subscribe to the Kimi assistant for chat, deep research, and agentic coding, and developers who call Kimi/Moonshot models over a per-token API. The company monetizes hosted inference and a managed assistant on top of weights it releases openly — a structurally similar bet to Europe’s Mistral AI, executed inside China’s brutally competitive model-price market.

Founded in March 2023 by Tsinghua schoolmates Yang Zhilin, Zhou Xinyu, and Wu Yuxin, Moonshot launched the Kimi assistant in October 2023 with a long-context hook — the first product to support 200,000-character Chinese input, later expanded to 2 million characters in a March 2024 upgrade that went viral across Chinese social media. That long-context positioning is the company’s founding differentiator and shows up directly in the price sheet: the legacy moonshot-v1 API charges by context window.

Moonshot is among China’s top-funded LLM startups, backed by Alibaba, Tencent, IDG Capital, 5Y Capital, Meituan, and China Mobile. Its valuation climbed from roughly $4.3B early on to about $10B (February 2026) and then to around $20B after raising ~$2B in May 2026 — a trajectory tied to demand for its open-weight models and a reported Hong Kong IPO targeted for the second half of 2026. The July 2025 release of Kimi K2 — a 1-trillion-parameter Mixture-of-Experts model with 32B active parameters — became the fastest-downloaded model on Hugging Face one day after launch and helped ignite an aggressive China model-price war.


Pricing summary : subscription assistant plus a context-tiered, cache-discounted API

Moonshot AI runs a two-track model: flat-rate freemium subscriptions for the Kimi assistant and pure usage-based pricing for the developer API, billed per million tokens. The dimensions are:

  • Kimi assistant seats — a tempo-named ladder: Free/Adagio ($0), Moderato ($19/mo), Allegretto ($39/mo), Allegro ($99/mo), Vivace ($199/mo). Annual plans save 20%.
  • API tokens (Kimi K2 family) — per-million-token input/output rates: Kimi K2.5 at $0.60 in / $3.00 out, Kimi K2.6 at $0.95 in / $4.00 out, with legacy K2 models at $0.55 in / $2.20 out before their May 2026 end-of-life.
  • API tokens (moonshot-v1, context-tiered) — the same model priced by context window: $0.20/$2.00 at 8k, $1.00/$3.00 at 32k, $2.00/$5.00 at 128k (in/out per 1M).
  • Context caching — cache-hit prompt prefixes billed roughly 80–85% below cache-miss input (e.g. $0.10/M cache-hit vs $0.60/M cache-miss on K2.5).
  • Recharge tiers — API rate limits scale with cumulative recharge; periodic top-up bonus programs add 20–30% voucher credit on large single top-ups.

What makes this different: Moonshot prices per-million-token billing publicly, but adds two China-market signatures — context-length as a price axis on the legacy API and a deep context-caching discount it pioneered in 2024 — on top of open weights you can download and run yourself.


Pricing by product

Kimi assistant — consumer & power-user plans (USD/mo)

TierPriceIncludedKey mechanics
Free (Adagio)$0Long-context Kimi assistant; 6 agent uses; 200 Pro Data requestsEntry point; the original viral free product
Moderato$19 / moKimi K2.6 in chat + agent tasks; Deep Research; Kimi Code 1דPopular” power-user tier
Allegretto$39 / moEverything in Moderato plus Agent Swarm, Kimi Claw, 5× code creditsAgentic workloads
Allegro$99 / moEnhanced Agent Swarm; 15× code creditsHeavy agent + code use
Vivace$199 / moMaximum quotas; up to 240 Agent Swarm uses; 30× code creditsTop consumer tier

Annual plans save 20%. The original Kimi assistant launched free in October 2023 and added a viral “tipping/recharge” feature in May 2024 to buy priority during capacity crunches.

API — Kimi K2 family (per 1M tokens, USD)

ModelInput /MOutput /MCache-hit input /MKey mechanics
Kimi K2.5$0.60$3.00$0.10Open-weight; vision + agent swarm; 262K context
Kimi K2 Thinking$0.60$2.50Reasoning + tool use; 262K context
Kimi K2.6$0.95$4.00$0.16Latest; +58% input vs K2.5
Legacy Kimi K2 (0711 / 0905 / Turbo)$0.55$2.20$0.15End-of-life 2026-05-25

Cache-hit input is billed roughly 80–85% below the cache-miss rate across the family. Models are open-weight on Hugging Face under a modified MIT license.

API — moonshot-v1 (context-length-tiered, per 1M tokens, USD)

ModelInput /MOutput /MKey mechanics
moonshot-v1-8k$0.20$2.00Same model, 8k context window
moonshot-v1-32k$1.00$3.004× context, 5× input price
moonshot-v1-128k$2.00$5.0016× context, 10× input price

The legacy moonshot-v1 family prices the same model by context window — context length is literally the meter. The newer Kimi K2 models fold long context into a single per-model rate instead.

Sales motions across products: PLG / self-serve for the Kimi assistant tiers and the entire pay-as-you-go API; recharge tiers and top-up bonuses reward larger prepaid balances. There is no public sales-gated enterprise floor — the public rate card is the anchor.


Hidden costs : What Moonshot AI (Kimi) users actually pay

Moonshot’s headline rates are among the lowest in the market, but the real bill is shaped by three things the sticker price doesn’t show: whether your prompts actually hit the cache, which context tier you land on with the legacy API, and the output-token premium that dominates generation-heavy workloads. Two archetypes show how the total assembles.

Archetype 1 — a developer running a long-context RAG agent on Kimi K2.5. Assume ~60M input + ~15M output tokens a month, with a stable system prompt and document context that hits the cache about 60% of the time.

Line itemMonthly cost
K2.5 input — 24M cache-miss tok @ $0.60/M$14.40
K2.5 input — 36M cache-hit tok @ $0.10/M$3.60
K2.5 output — 15M tok @ $3.00/M$45.00
Estimated total~$63/mo

The lesson: cache hits collapse the input line — the 36M cached tokens cost less than $4, versus ~$21.60 if they all missed. But output is the dominant meter at $3.00/M (5× the cache-miss input rate), so generation-heavy work (long answers, code) drives the bill regardless of how well you cache.

Archetype 2 — a legacy app pinned to moonshot-v1-128k for full-document context. A team feeds whole documents into the 128k context window, running ~20M input + ~5M output tokens a month.

Line itemMonthly cost
moonshot-v1-128k input — 20M tok @ $2.00/M$40.00
moonshot-v1-128k output — 5M tok @ $5.00/M$25.00
Estimated total~$65/mo

Here the surprise is the context-tier tax: the same workload on moonshot-v1-8k would cost $0.20/$2.00 (about $14/mo), so reaching for the 128k window is a roughly 4–5× premium. The fix is usually to migrate to a Kimi K2 model, which folds long context into one flat per-model rate plus caching rather than charging an escalating context tier.

Want to estimate your own Moonshot AI bill? Use the Moonshot AI pricing calculator to model your costs based on token volume, cache-hit rate, context tier, and assistant seats.


Pricing evolution : Moonshot AI pricing history and changes

Moonshot’s pricing evolved along two tracks. The Kimi assistant went from a free, long-context novelty (2023) to a tempo-named subscription ladder; the API moved from context-tiered moonshot-v1 rates (2024) to an aggressive open-weight Kimi K2 price war (2025), then its first material price increase with K2.6 (2026). The dated milestones below are reconstructed from primary announcements and contemporaneous press.

Cadence

QuarterPrice changesProduct / SKU additionsNotes
2023 Q401Kimi assistant launches free, built around long context
2024 Q1112M-character context goes viral; context-tiered moonshot-v1 API live
2024 Q211Viral tipping/recharge feature; context caching pioneered
2025 Q311Kimi K2 open-weight launch ignites a price war
2026 Q101Kimi K2.5 adds vision + agent swarm; deep cache discount
2026 Q211Kimi K2.6 raises input 58%; legacy K2 hits end-of-life

Tracked range: 2023 Q4–2026 Q2. Quarters not listed had no publicly announced price or SKU change. Dated milestones below cite primary/secondary sources; the China-native RMB card may move on a different cadence than the international USD card.

Notable changes

  • 2023-10 — Kimi assistant launches free, built around 200,000-character long context (Wikipedia).
  • 2024-03 — Lossless context expands to 2,000,000 Chinese characters and goes viral; context-tiered moonshot-v1 API (8k/32k/128k) is live.
  • 2024-05 — Viral tipping/recharge feature lets users pay for priority during capacity crunches; API rate limits tier by cumulative recharge.
  • 2024-07 — Context caching pioneered, cutting cache-hit input ~80–85% below cache-miss.
  • 2025-07Kimi K2 open-weight (1T-param MoE) launches at aggressive rates and tops Hugging Face downloads, pressuring OpenAI/Anthropic on price (HPCwire).
  • 2026-02Kimi K2.5 adds vision and agent swarm at $0.60 in / $3.00 out, cache-hit input $0.10/M; top-up bonus program reinforces recharge.
  • 2026-04Kimi K2.6 raises input 58% ($0.60 to $0.95/M) and output to $4.00/M — read as a pre-IPO commercialization signal (KuCoin); legacy K2 reaches end-of-life 2026-05-25.

The Kimi K2.6 price increase in detail

The April 2026 K2.6 release broke a two-year pattern of cuts. Input rose 58% from $0.60 to $0.95 per 1M tokens, output climbed to $4.00/M (21 to 27 yuan RMB), and even the cache-hit input rate ticked up (0.7 to 1.1 yuan RMB). After years of using aggressive pricing to win developer share, Moonshot raised rates on its flagship model — which observers read as deliberate proof of “commercialization capability” ahead of a reported Hong Kong listing in the second half of 2026, following competitors Zhipu and MiniMax to public markets. The strategic read: a lab that can raise prices on a leading model without losing its developer base is a more investable lab, and the open-weight escape hatch (download K2.6 yourself) blunts the backlash a closed-weight increase would trigger.


What’s unique : Moonshot AI’s distinctive pricing mechanics

1. Context length as a price axis. The legacy moonshot-v1 family charges different per-token rates for the same model depending on the context window you request — $0.20/$2.00 at 8k up to $2.00/$5.00 at 128k per 1M tokens. Most vendors charge one rate and cap context; Moonshot, whose founding differentiator was long context, turned the context window itself into a billable tier. Kimi K2 later folded this into a single rate, but the mechanic remains a signature of how Moonshot thinks about value.

2. Context caching as the headline cost lever. Moonshot pioneered context caching for LLM APIs in 2024, billing cache-hit prompt prefixes ~80–85% below cache-miss (e.g. $0.10/M vs $0.60/M on K2.5). For agents and RAG systems with stable system prompts and reused documents, caching — not the sticker rate — is what actually determines the bill, and Moonshot was early enough that the primitive shaped how the broader token economics market now prices repeated context.

3. Open weights plus a recharge-tiered, price-war API. Kimi K2 models ship open-weight under a modified MIT license, yet Moonshot monetizes hosted inference at rates aggressive enough to pressure OpenAI and Anthropic. Layered on top is a recharge model — API rate limits scale with cumulative top-ups, and periodic bonus programs grant 20–30% voucher credit on large single top-ups — a consumer-style prepaid mechanic carried into a developer API. The April 2026 K2.6 increase shows the strategy maturing from share-grab toward margin.


Strengths & weaknesses

StrengthsWeaknesses
Among the lowest public per-million-token rates — Kimi K2.5 at $0.60 in / $3.00 out undercuts most Western frontier modelsOutput-token premium dominates: K2.5 output ($3.00/M) is 5× the cache-miss input rate, surprising generation-heavy workloads
Deep context-caching discount (~80–85% off) rewards stable system prompts and RAG contextReal cost depends heavily on cache-hit rate, which is hard to predict before deployment
Open weights under modified MIT license — a credible self-host hedge against lock-inLegacy moonshot-v1 context-tier “tax” can 4–5× the bill for full-document workloads vs the 8k tier
Fully public rate card — no “contact sales” wall for either the assistant or the APIFirst price increase (K2.6, +58% input) signals the era of relentless cuts is ending
Free Kimi assistant and a low $19 entry tier seed adoptionChina-native RMB card and international USD card can diverge; currency and availability vary by region
Recharge bonuses (20–30%) reward committed developers without formal commit contractsLegacy K2 end-of-life (2026-05-25) forces migration; frequent model churn complicates price tracking

Billing UX : usage tracking and overage controls

  • Recharge-tiered rate limits — API rate limits scale with cumulative recharge, so heavier prepaid balances unlock higher throughput rather than a fixed quota.
  • Top-up bonus vouchers — periodic programs grant 20–30% bonus credit on large single top-ups, rewarding committed prepaid spend.
  • Context-caching controls — developers structure prompts to maximize cache hits, since cache-hit input is billed ~80–85% below cache-miss; the billing distinction is exposed per request.
  • OpenAI-compatible API — model swapping via a standard model: field (e.g. kimi-k2.6) lets teams trial Moonshot against incumbents with minimal code change.
  • Open-weight escape hatch — because the K2 family is downloadable under a modified MIT license, teams can self-host to cap cost rather than scaling hosted spend.
  • Consumer annual discount — Kimi assistant annual plans save 20% versus monthly, the standard commit-for-discount lever on the consumer side.
  • Currency split — an international USD card (platform.kimi.ai) sits alongside a China-native RMB ¥ card, with rates and availability that can differ by region.

Strategic wins : Why Moonshot AI’s pricing decisions worked

1. Turning long context into a price axis, then a brand

Moonshot built its early reputation on long context, then priced it directly — the moonshot-v1 context tiers made “how much context you need” a literal billing dimension. That tied the headline differentiator to the revenue line and trained the market to associate Kimi with long-context value before the price war began. See choosing the right usage metric for why aligning the meter with the differentiator compounds.

2. Pioneering context caching as a cost lever

By shipping context caching in 2024 — billing cache hits ~80–85% cheaper — Moonshot gave developers a way to slash real costs on stable, repeated context, which made aggressive headline rates even more attractive for agentic and RAG workloads. It was early enough to influence how the broader market now prices reused context, a structural win beyond any single rate. This is the usage-based pricing discipline of pricing the marginal cost, not the gross call.

3. Open weights plus a price war to win developer share

Releasing Kimi K2 open-weight and pricing hosted inference aggressively enough to top Hugging Face downloads turned model R&D into distribution — developers evangelize the open models while Moonshot monetizes managed inference and the assistant. Pairing that with recharge bonuses built a committed-spend base, mirroring the shift away from rigid licensing toward flexible, usage-anchored monetization.


Areas to improve : Gaps in Moonshot AI’s pricing approach

1. Make total cost predictable before deployment

Because the bill hinges on cache-hit rate and context tier, the headline rate understates real spend. A “estimated cost per workload” calculator that factors expected cache hits and context length — surfaced before commit — would reduce the bill-shock and unpredictability risk that usage pricing is meant to remove, especially for teams new to context caching.

2. Publish hard assistant quotas, not just tempo names

The Kimi assistant tiers (Adagio through Vivace) lead with agent-use counts and credit multipliers, but the underlying message and long-context caps are not always stated numerically per tier. Concrete per-tier quotas would let buyers self-select without fear of silent throttling during capacity crunches — the very problem the 2024 tipping feature exposed.

3. Reconcile the USD and RMB cards

The international USD card and China-native RMB ¥ card can diverge in rate, model availability, and cadence (the K2.6 increase was expressed partly in yuan). A clearer, reconciled cross-currency view would help global teams evaluating Moonshot against other AI companies’ transparency avoid surprises tied to which card they land on.


Key takeaways

  1. Price the differentiator. Moonshot turned long context — its founding hook — into a literal billing axis on the moonshot-v1 API, tying the brand promise to the revenue line before the price war started.
  2. Caching is the real meter. A ~80–85% cache-hit discount means the sticker rate is only half the story; for agents and RAG, how well you cache determines the bill more than which model you pick.
  3. Open weights make aggressive pricing safer. Releasing Kimi K2 open-weight gave Moonshot distribution and an escape hatch that blunts backlash — even when it later raised prices on K2.6.
  4. A price increase can be a signal, not just a number. The +58% K2.6 input bump was read as commercialization proof ahead of a reported IPO — pricing as a market message, not only a margin move.
  5. Recharge mechanics carry consumer psychology into a dev API. Tiered rate limits and top-up bonuses reward committed prepaid spend without formal commit contracts — a consumer-style lever applied to developer monetization.

UBP implications

  1. Context windows can be a billable dimension. Moonshot shows that “how much context you need” is itself a value metric you can tier and charge for — a reminder for UBP designers to look past tokens-per-call to the structural dimensions buyers actually scale on.
  2. Price the marginal cost, not the gross call. Context caching prices reused context near its true marginal cost, which makes aggressive headline rates sustainable. UBP practitioners should isolate the cheap-to-serve portion of usage and price it accordingly rather than charging a flat rate for everything.
  3. Open weights reshape the value question from “what” to “where.” When the model is free to download, the priced dimension becomes managed inference, caching, and the assistant — not the model itself. UBP design has to follow the cost driver customers can’t trivially replicate, an early signal of the move toward outcome-shaped pricing.

Sources


Bottom line

Moonshot AI prices two surfaces from its Kimi platform: a tempo-named assistant ($0 Free to $199/mo Vivace) and an aggressively low per-token API. Two China-market signatures define it — context length as a price axis on the legacy moonshot-v1 models, and a deep context-caching discount (~80–85% off) Moonshot pioneered in 2024 — layered over open weights you can download and self-host. The April 2026 Kimi K2.6 increase (+58% input) marks the end of relentless cuts and a turn toward commercialization ahead of a reported IPO. The main friction is cost predictability: the real bill depends on cache-hit rate and context tier far more than the headline rate suggests.

Want to compare Moonshot AI against other foundation-model providers? See Mistral AI and OpenAI, or browse the full pricing blueprint.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Live snapshot: two-track Kimi pricing reconstructed

Captured rates: Kimi assistant Free/$19/$39/$99/$199 per month; moonshot-v1 context-tiered API ($0.20/$2.00 at 8k up to $2.00/$5.00 at 128k); Kimi K2.5 $0.60/$3.00 (cache-hit $0.10) and K2.6 $0.95/$4.00 (cache-hit $0.16) per 1M tokens. The moonshot.cn capture returned a stale nginx page; prices reconstructed from the international card mirrors + press.

Kimi K2.6 raises prices 58% — a pre-IPO commercialization signal

Kimi K2.6 reaches GA around 2026-04-22 with the first material price increase: input up 58% from $0.60 to $0.95/M, output to $4.00/M (21 to 27 yuan RMB), cache-hit input to $0.16/M (0.7 to 1.1 yuan RMB). Read by observers as commercialization proof ahead of a reported Hong Kong listing. Legacy K2 models hit end-of-life 2026-05-25. (Source: KuCoin, OpenRouter; accessed 2026-06-11.)

Kimi K2.5 adds vision + agent swarm; deep cache discount

Kimi K2.5 launches open-weight with vision and agent-swarm capabilities at $0.60/M input (cache miss) / $3.00/M output, with cache-hit input at just $0.10/M — about 83% off. A top-up bonus program (20–30% voucher bonuses on large single top-ups) reinforces the recharge model. (Source: InfoQ, eesel, Kimi on X; accessed 2026-06-11.)

Kimi K2 Thinking adds reasoning at flat low rates

Kimi K2 Thinking ships on 2025-11-06 with a 262,144-token context and interleaved reasoning plus tool use, priced at $0.60/M input and $2.50/M output — holding the aggressive low-rate posture while moving up the capability curve. (Source: OpenRouter; accessed 2026-06-11.)

Kimi K2 open-weight launch ignites a price war

On 2025-07-11 Moonshot releases Kimi K2 open-weight — a 1-trillion-parameter Mixture-of-Experts model (32B active) — at aggressive API rates (around $0.15/M cache-hit input, $2.50/M output). It becomes the fastest-downloaded model on Hugging Face one day after launch and pressures OpenAI/Anthropic on price. (Source: HPCwire, Hugging Face; accessed 2026-06-11.)

Context caching pioneered to cut repeated-prompt cost

Moonshot rolls out context caching for its API — billing cache-hit prompt prefixes at a fraction of the cache-miss rate (roughly 80–85% off). It becomes a defining cost lever and an industry-influencing primitive other LLM vendors later adopt. (Source: aggregator/press reconstruction; accessed 2026-06-11.)

Viral 'tipping/recharge' reward to buy priority

Amid capacity crunches, Kimi introduces a paid tipping/recharge feature letting users pay to jump the queue and get priority during peak load — an early, much-discussed test of individual willingness-to-pay. API rate limits are also tiered by cumulative recharge. (Source: Baidu Baike, contemporaneous reporting; accessed 2026-06-11.)

2M-character context goes viral; per-token API live

Kimi expands lossless context from 200,000 to 2,000,000 Chinese characters, which goes viral across Chinese social media and drives a user surge. The moonshot-v1 developer API bills per million tokens with context-length-tiered rates (8k/32k/128k priced separately). (Source: Wikipedia, contemporaneous press; accessed 2026-06-11.)

Kimi assistant launches free, built around long context

Moonshot AI (founded March 2023) releases the Kimi assistant on 2023-10-09 — a free consumer chatbot positioned around long-context, the world's first product supporting 200,000-character Chinese input. Monetization is deferred; the product grows on the long-context hook. (Source: Wikipedia, Baidu Baike; accessed 2026-06-11.)

Trivia
  • · Moonshot AI's name (月之暗面, 'Dark Side of the Moon') comes from founder Yang Zhilin's favorite Pink Floyd album — the company launched on the album's 50th anniversary in March 2023.
  • · The legacy moonshot-v1 API charges different per-token rates for the SAME model depending on context window: $0.20/$2.00 per 1M at 8k, but $2.00/$5.00 at 128k — context length is literally the price axis.
  • · Moonshot pioneered context caching for LLM APIs in 2024; cache-hit input is billed ~80–85% below cache-miss (e.g. $0.10/M vs $0.60/M on Kimi K2.5).

Questions & answers

What is Moonshot AI's pricing model?
Moonshot runs a two-track model: subscriptions for the Kimi assistant (Free, then $19/$39/$99/$199 per month) and a usage-based per-token API. The legacy moonshot-v1 API is context-length-tiered (8k/32k/128k each priced differently), while Kimi K2 open-weight models are billed per million tokens with a steep context-caching discount.
Does Moonshot AI (Kimi) offer a free tier?
Yes. The Kimi assistant has a Free plan (named Adagio) at $0/mo with limited agent uses and Pro Data requests. The original Kimi assistant launched free in 2023 and became famous for long-context (up to 2 million Chinese characters).
How much does the Kimi K2 API cost per million tokens?
Kimi K2.5 is about $0.60 input / $3.00 output per 1M tokens, with cache-hit input as low as $0.10/M. The newer Kimi K2.6 is $0.95 input / $4.00 output per 1M (cache-hit input $0.16/M). Legacy K2 models ran $0.55 in / $2.20 out before their May 2026 end-of-life.
Why does Moonshot charge different rates for the same model?
The legacy moonshot-v1 family is context-length-tiered: moonshot-v1-8k is $0.20 in / $2.00 out, moonshot-v1-32k is $1.00 in / $3.00 out, and moonshot-v1-128k is $2.00 in / $5.00 out per 1M tokens. You pay more per token for a longer context window, a signature mechanic Moonshot built around long-context as its core differentiator.
What is context caching and how much does it save?
Moonshot pioneered context caching for LLM APIs in 2024. Repeated prompt prefixes that hit the cache are billed at a fraction of the cache-miss rate — roughly 80–85% off across the K2 family (for example $0.10/M cache-hit vs $0.60/M cache-miss on K2.5).
Are Kimi models open-weight?
Yes. Kimi K2 (a 1-trillion-parameter Mixture-of-Experts model launched July 2025) and its successors are released open-weight on Hugging Face under a modified MIT license, while Moonshot monetizes hosted inference per token and the managed Kimi assistant.