How much does Cohere's Command API cost?

Command A costs $2.50 per million input tokens and $10 per million output tokens with a 256K context window. Command R costs $0.15 per million input tokens and $0.60 per million output tokens with a 128K context window. Command R7B costs $0.0375 per million input and $0.15 per million output — the lowest-cost option in the Command family.

Does Cohere have a free tier?

Yes. Cohere offers a free trial API tier with rate-limited access to all endpoints — no credit card required. Trial accounts have lower requests-per-minute limits but can access Command, Embed, and Rerank APIs for testing and prototyping.

How does Cohere Rerank pricing work?

Cohere Rerank is billed at $2 per 1,000 queries (i.e., $0.002 per rerank call), regardless of how many documents are in the candidate set. This per-query billing maps directly to application events rather than raw token consumption, making cost forecasting straightforward for RAG pipelines.

Can I run Cohere models privately on my own cloud?

Yes. Cohere's defining differentiator is private deployment — models can be deployed on AWS Bedrock, Azure AI, Google Vertex AI, Oracle Cloud Infrastructure, or on-premises. Enterprise private-deployment contracts are sales-led and custom-priced based on usage commitments.

How does Cohere's Embed v4.0 pricing compare to v3?

Embed v4.0 costs $0.12 per million tokens — slightly more than Embed v3 at $0.10 per million tokens — but adds multimodal support (images and text), a 128K context window for long-document embedding, and improved retrieval accuracy for enterprise search use cases.

What is Cohere's enterprise pricing model?

Enterprise pricing for private deployment is custom and sales-negotiated. Cohere offers dedicated models in a customer's VPC or on-premises, with pricing tied to usage volume commitments. Public API usage is pay-as-you-go with no seat or subscription fees; enterprise contracts typically include committed-use discounts.

Cohere Pricing

AI Summary

Cohere operates a pure usage-based API model: pay per token for text generation (Command family) and embeddings, pay per query for Rerank — no seat fees or subscription base charge outside of enterprise contracts.
Command A (March 2025) is Cohere's flagship publicly-priced agentic model at $2.50/$10 per 1M input/output tokens with a 256K context window; Command R at $0.15/$0.60 per 1M targets cost-sensitive RAG workloads. Command A+ — Cohere's newest, most capable model — is positioned above it but is sold via enterprise/sales contract rather than a public per-token rate.
Cohere's strategic differentiation is private deployment: enterprises can run Cohere models on their own cloud (AWS, Azure, GCP, OCI) or on-premises, making it the only frontier AI company where data never leaves the customer's infrastructure.
A free trial tier is available with rate-limited access (no credit card required), giving developers access to all API endpoints to test before committing to production usage.
Cohere generates revenue through API usage fees and enterprise private-deployment licenses, with key customers including Oracle, Salesforce, HCL, and LivePerson.
Cohere's Embed v3 and Rerank APIs have become de-facto standards for enterprise RAG pipelines, with Embed v4.0 (2025) extending to multimodal inputs and 128K context for long-document retrieval.

Pricing summary

Cohere 2026 — Usage-based API pricing

Pure pay-per-token for Command and Embed; per-query for Rerank; enterprise private deployment on custom contract

Trial

Free

Developers evaluating the API

$0.60/1M output

Production (Command R)

$0.15 /1M input tokens

Cost-sensitive RAG and retrieval workloads

$10.00/1M output

Production (Command A)

$2.50 /1M input tokens

Agentic enterprise workloads

Enterprise Private Deploy

Custom

Regulated industries, data-sovereignty requirements

Embed v4.0

$0.12 /1M tokens

Semantic search and RAG pipelines

Rerank

$2.00 /1K queries

RAG pipelines requiring result ordering

All production pricing is pure pay-as-you-go with no minimum spend. Trial tier is rate-limited; production tier unlocks higher quotas. Enterprise private-deployment contracts are sales-negotiated.

About

Cohere is a Toronto-headquartered enterprise AI company founded in 2019 by Aidan Gomez (a co-author of the “Attention Is All You Need” transformer paper), Ivan Zhang, and Nick Frosst (a former Google Brain researcher). Cohere builds large language models for enterprise use, delivering them through a public API and, distinctively, through private deployment directly into customers’ own cloud infrastructure or on-premises data centers.

Cohere’s product portfolio spans three API families: Command (text generation and chat for reasoning, summarization, and agentic workflows), Embed (vector embeddings for semantic search and retrieval-augmented generation), and Rerank (relevance scoring for search result ordering). As of 2026 the Command line has expanded beyond Command A to include Command A+ (Cohere’s newest and most capable model, available open-weights), plus specialized Command A Reasoning and Command A Vision variants and a Transcribe speech model — though Cohere no longer posts current per-token rates for these on its public pricing page (see Pricing summary below). The company targets data-sovereignty-sensitive industries — financial services, healthcare, government, and regulated manufacturing — where sending data to a third-party API is legally or operationally constrained.

By 2026 Cohere has raised over $1B in total funding across its Series A through D, with the $500M Series D in July 2024 bringing its valuation to $5B. Strategic investors include Nvidia, Salesforce Ventures, Oracle, and Fujitsu — all cloud and enterprise compute partners with commercial distribution agreements, not just financial backers. Key enterprise customers include Oracle (which bundles Cohere in OCI AI Services), Salesforce (which integrates Command in Einstein AI), HCL Technologies, and LivePerson.

Pricing summary : pure usage-based with no subscription floor

Cohere runs a pure pay-as-you-go model for its public API: no monthly seat fee, no minimum spend, no subscription tier. Developers pay per token for Command and Embed, and per query for Rerank. Billing starts only when a request is made; the trial tier is free with rate limits and no credit card required.

The public API has two distinct pricing bands within the Command family. The efficiency tier (Command R, Command R7B) starts at $0.15 per million input tokens — targeting cost-sensitive RAG, classification, and structured-extraction workflows where generation length is moderate. The capability tier (Command A, Command R+) starts at $2.50 per million input tokens — targeting reasoning-intensive, agentic, and long-document tasks where quality and context length matter more than cost per token.

As of 2026, Cohere’s public pricing page (cohere.com/pricing) no longer lists current per-token rates — it surfaces only “legacy model” rates for existing customers and per-instance Model Vault pricing (dedicated managed deployment, e.g. Embed 4 from $4.00/hour or $2,500/month, Rerank 4 Pro up to $10.00/hour or $6,500/month). The active per-token rates below are confirmed via model-pricing databases (LiteLLM, Vercel AI Gateway, Azure AI catalog) that mirror Cohere’s live API rate card. Newer SKUs — Command A+, Command A Reasoning, Command A Vision, Transcribe — are sales-led with no public per-token rate.

What makes this different: Cohere is the only major frontier AI company that treats private deployment as the primary enterprise product rather than a premium add-on. While OpenAI and Anthropic sell API access as the default and offer private deployment as a rare enterprise arrangement, Cohere actively routes enterprise deals toward in-VPC or on-premises licensing. This fundamentally changes the pricing conversation for compliance-sensitive buyers: instead of negotiating API terms, they negotiate a software license for compute they already control. See how AI companies structure enterprise pricing for broader context on this shift.

Pricing by product

Command (Text Generation / Chat)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context window	Best for
Command A (command-a-03-2025)	$2.50	$10.00	256,000 tokens	Agentic tasks, complex reasoning
Command R+ (command-r-plus-08-2024)	$2.50	$10.00	128,000 tokens	Enterprise RAG, long-document analysis
Command R (command-r-08-2024)	$0.15	$0.60	128,000 tokens	Cost-efficient RAG and structured tasks
Command R7B (command-r7b-12-2024)	$0.0375	$0.15	128,000 tokens	High-throughput, low-cost workflows

Embed (Vector Embeddings)

Model	Input (per 1M tokens)	Output	Context window	Best for
Embed v4.0	$0.12	Free	128,000 tokens	Long-document, multimodal retrieval
Embed v3 English	$0.10	Free	512 tokens	English-language semantic search
Embed v3 Multilingual	$0.10	Free	512 tokens	Cross-lingual retrieval (100+ languages)

Rerank (Relevance Scoring)

Model	Price	Max documents	Context	Best for
Rerank v4.0 Pro	$2.50 per 1K queries	100	32,768 tokens	High-accuracy RAG pipelines
Rerank v4.0 Fast	$2.00 per 1K queries	100	32,768 tokens	Latency-sensitive applications
Rerank v3.5	$2.00 per 1K queries	100	4,096 tokens	Standard RAG, broad availability

Enterprise Private Deployment

Deployment	Availability	Pricing
AWS Bedrock	Command R, R+, R7B, Embed, Rerank	AWS-billed on-demand (comparable to API rates)
Azure AI Studio	Command R, R+, A, Embed v3/v4, Rerank	Azure-billed pay-as-you-go
Google Vertex AI	Command R, R+	GCP-billed pay-as-you-go
Oracle Cloud (OCI)	Command A, R+, Embed, Rerank	Bundled in OCI AI Services
Model Vault (Cohere-managed)	Embed 4, Rerank 3.5 / 4 Fast / 4 Pro	Per-instance: $4.00–$10.00/hour or $2,500–$6,500/month
On-premises / VPC	Full model family	Custom license — sales-led

Sales motions across products: Trial and public API are self-serve with no sales contact required. Production API is PLG/self-serve for developers up to any spend level. Enterprise private deployment (VPC, on-premises, or committed-use discounts) is exclusively sales-led with custom contracts.

Hidden costs : what Cohere users actually pay beyond the token rate

Archetype A: Mid-size SaaS company building a RAG chatbot on Command R

A 20-person engineering team building an internal knowledge chatbot, processing 50M input tokens and 10M output tokens per month on Command R:

Line item	Monthly cost
Command R: 50M input tokens × $0.15	$7.50
Command R: 10M output tokens × $0.60	$6.00
Embed v3 (indexing 5M tokens/mo)	$0.50
Rerank: 500K queries × $0.002	$1.00
Estimated total	~$15/month

The token economics on Command R are genuinely low. The real cost multiplier for RAG architectures is the Rerank query volume — if the application runs reranking on every user query, and monthly active users are high, Rerank can exceed Command R costs at scale.

Archetype B: Enterprise deploying Command A for agentic document workflows

A 500-person financial services firm using Command A for multi-step analysis workflows — 10M input tokens and 5M output tokens per month:

Line item	Monthly cost
Command A: 10M input tokens × $2.50	$25.00
Command A: 5M output tokens × $10.00	$50.00
Embed v4.0: 20M tokens (document indexing)	$2.40
Rerank v4.0 Pro: 200K queries × $0.0025	$0.50
Estimated total	~$78/month

At higher agentic volumes, the output token cost dominates. Command A’s $10/1M output rate means a workflow that generates 10,000-token responses per task incurs $0.10 per agentic run — comparable to legacy enterprise software transaction fees but for genuinely novel reasoning capability.

Hidden costs to watch:

Agentic output amplification — multi-step agentic tasks can produce 5–20× more output tokens than a single chat turn, making Command A cost curves non-linear with agentic complexity.
Context window padding — always-on system prompts or long conversation histories count as input tokens on every call. A 10,000-token system prompt adds $0.025 to every Command A call.
Rerank at query scale — at 1M user queries/month, Rerank adds $2,000/month regardless of how cheap Command R is for generation.
Cloud provider mark-ups — AWS Bedrock and Azure AI add their own infrastructure margin on top of Cohere’s rates; direct API is typically cheaper for pure token costs.

Estimate your own Cohere API costs with the Cohere pricing calculator — model token volumes, rerank queries, and embedding usage to project monthly spend.

Pricing evolution : how Cohere’s pricing has changed since 2021

Cadence

Quarter	Product / SKU additions	Notes
2021 Q4	2	Generate and Embed APIs launched with pay-as-you-go pricing
2022 Q4	2	Classify and Summarize endpoints added; free trial tier formalized
2023 Q2	1	Rerank API launched at $2/1K queries; Command model family named
2024 Q1	2	Command R launched ($0.15/$0.60 per 1M tokens); Rerank v3 refresh
2024 Q2	1	Command R+ launched ($2.50/$10 per 1M tokens); Azure/AWS day-one
2024 Q3	0	Command R and R+ 08-2024 refreshes; no pricing change
2024 Q4	1	Command R7B launched ($0.0375/$0.15 per 1M) — lowest cost in family
2025 Q1	2	Command A launched ($2.50/$10 per 1M, 256K ctx); Embed v4.0 ($0.12/1M, multimodal)
2025 Q2	2	Rerank v4.0 Pro and Fast launched on Azure AI at $0.0025 and $0.002/query

Tracked range: 2021 Q4–2025 Q2. Cohere has never raised publicly posted per-token prices on an existing model — it cuts prices by launching cheaper new SKUs rather than repricing older models.

Notable changes

2023 Q2 — Rerank API launched at $2 per 1,000 queries — a per-query billing unit novel in the LLM API space where token-based billing was the norm. This positioned Rerank as an application-event-priced product rather than a raw compute product.
2024-03 — Command R launched at $0.15/$0.60 per 1M tokens. This was the first Cohere model priced below $1/1M input, signaling a deliberate efficiency tier designed to compete with open-source models on cost while maintaining API convenience.
2024-04 — Command R+ launched simultaneously on Cohere API, Azure AI Studio, and AWS Bedrock — the first time Cohere launched a model across all three major clouds on day one. This multi-cloud-first launch strategy reflects Cohere’s enterprise distribution model.
2024-12 — Command R7B launched at $0.0375/1M input and $0.15/1M output — by far the cheapest model in the Command family, with input priced under four cents per million tokens. It targets high-throughput, cost-sensitive applications such as classification, extraction, and intent detection where per-call cost matters most.
2025-02 — Command A launched with a 256K context window — double Command R+‘s 128K — without a price increase over Command R+. The $2.50/$10 rate for a 256K-context agentic model represents a significant capability-per-dollar improvement over the prior tier.

What’s unique : Cohere’s distinctive pricing mechanics and positioning

1. Private deployment as the primary enterprise product, not an afterthought. Cohere is the only major frontier AI company that architecturally positions running models in a customer’s own infrastructure as the default enterprise path. OpenAI, Anthropic, and Google treat their cloud APIs as the core product; private deployment is rare, expensive, and structurally more complex. Cohere inverts this: its strategic cloud partnerships (AWS, Azure, GCP, OCI) are distribution channels for customers who want Cohere’s models without sending data to Cohere’s servers. This reflects a genuinely different philosophy about enterprise AI procurement — one that resonates with regulated industries.

2. Rerank billed per query, not per token. Cohere’s Rerank API charges $2 per 1,000 queries, regardless of how many documents are in the candidate set or how long they are. This per-event billing maps directly to an application action (a search call) rather than to raw compute. Most LLM APIs force developers to translate compute consumption (tokens) into application-layer costs themselves; Rerank’s per-query pricing eliminates that mapping and makes cost forecasting in a usage-based billing system straightforward.

3. Two-tier capability-vs-efficiency pricing with no feature lock-in. Cohere’s Command family splits into a capability tier (Command A/R+ at $2.50/$10) and an efficiency tier (Command R at $0.15/$0.60, Command R7B at $0.0375/$0.15) — but both tiers expose identical API endpoints. There is no feature lock — efficiency-tier models support function calling, RAG, and structured output just like capability-tier models. This lets developers choose the right usage metric based on quality-vs-cost tradeoffs at the task level, not at the plan level.

4. Command R7B is one of the cheapest production LLM APIs on the market. Command R7B (December 2024) prices input at just $0.0375/1M and output at $0.15/1M — under four cents per million input tokens, an order of magnitude below the efficiency-tier Command R. It is purpose-built for high-throughput, cost-sensitive applications: classification, extraction, and intent detection workflows where many requests are processed and per-call cost dominates the economics.

5. No price increases on existing models — ever. Since Cohere launched its first public API in 2021, it has not raised the listed price of any existing model. Price reductions happen through new, cheaper SKUs (Command R7B, Command R being cheaper than the prior generation Command), not through price cuts on existing models, which would trigger renegotiation risk in existing enterprise contracts. This approach to pricing evolution allows Cohere to lower effective costs for new customers while preserving revenue predictability with existing ones.

Strengths & weaknesses

Strengths	Weaknesses
Private deployment as a first-class product differentiates from every major competitor	No consumer-facing product — 100% enterprise focus limits brand awareness and PLG growth loops
Rerank per-query billing is the most intuitive cost structure in RAG pipelines	Rerank query costs compound unexpectedly at high user-query volumes
Embed v4.0 with 128K context and multimodal support at $0.12/1M is best-in-class for long-doc retrieval	Free trial rate limits are restrictive enough to slow developer evaluation cycles
Two-tier Command pricing (efficiency vs capability) allows cost optimization without switching providers	Command A at $10/1M output is expensive for high-volume agentic workloads vs open-source alternatives
No price increases on existing models since 2021 — stable for enterprise budget planning	Pricing page requires authentication to view full details — less transparent than OpenAI/Anthropic public pages
Strategic cloud partnerships (AWS, Azure, GCP, OCI) mean Cohere is available in existing procurement frameworks	Enterprise-only sales motion for private deployment slows adoption by mid-market teams that can’t engage a sales cycle

Billing UX : developer billing controls and payment experience

Self-serve trial — API keys are generated instantly on signup at dashboard.cohere.com. No credit card required for the trial tier. Developers can make real API calls to Command, Embed, and Rerank within minutes of registration.
Trial rate limits — Trial accounts operate under lower requests-per-minute and tokens-per-minute quotas. Specific limits are documented in the Cohere dashboard but are restrictive enough that high-throughput testing requires upgrading to production.
Production upgrade — Adding a credit card in the dashboard unlocks production-tier rate limits and usage-based billing. Billing transitions automatically — no plan selection UI, no seat count to configure.
Usage dashboard — The Cohere dashboard provides per-endpoint token consumption, request counts, and spend to date. Granularity is by day; no real-time or sub-minute spend tracking is documented.
Spend alerts — Cohere does not publicly document spend-cap configuration or email alerts for approaching budget thresholds — a gap common across early-stage API providers. Developers must monitor usage programmatically or via dashboard polling.
Billing units — Command and Embed are billed in tokens (input and output tracked separately). Rerank is billed in queries. Both appear as separate line items in the usage dashboard.
Cloud marketplace billing — AWS Bedrock, Azure AI, and GCP Vertex AI route Cohere charges through the respective cloud invoice — meaning teams already consolidated on a cloud vendor can access Cohere without a separate billing relationship.
Billing cycle — Production bills are issued at the end of each calendar month, or earlier whenever the account reaches $250 in outstanding balance — whichever comes first.
Trial call cap — Beyond per-minute rate limits, trial keys (and production keys on newer Chat model variants) are capped at 1,000 API calls per month.
Enterprise invoicing — Enterprise private-deployment contracts use annual invoicing with committed usage tiers. Credit card billing is not used for enterprise contracts.
Payment methods — Public API: credit card (Visa, Mastercard, Amex). Enterprise: invoice, wire transfer, purchase order.

Strategic wins : where Cohere’s pricing decisions have paid off

1. Private deployment unlocked regulated-industry revenue that API-only competitors cannot access

Cohere’s decision to make private deployment central — rather than peripheral — to its enterprise product enabled it to win regulated-industry accounts that legally cannot route production data through a third-party API. Financial services firms operating under GDPR, CCPA, or banking secrecy laws; healthcare organizations subject to HIPAA; and government agencies with data-residency requirements all represent buyers for whom OpenAI’s API is not an option, regardless of capability. Cohere’s OCI partnership (Oracle bundling Cohere in its Sovereign AI offering) and its presence in AWS GovCloud are direct expressions of this strategy. This positioning aligns with the broader shift in AI outcome-based procurement where compliance is a purchasing prerequisite, not a feature.

2. The two-tier Command pricing structure created a land-and-expand motion

By pricing Command R at $0.15/1M input — cheap enough to be developer-expensable — and Command A at $2.50/1M input (enterprise-budget territory), Cohere created a natural adoption escalator. Developers prototype on Command R, generate internal success stories, and then propose Command A for production workloads requiring higher quality or longer context. This usage-based expansion pattern — start cheap, expand on demonstrated value — mirrors classic SaaS land-and-expand motions but applied to token consumption rather than seats. The same API contract, the same SDK, the same integration — just a model parameter change to move up-market.

3. Day-one multi-cloud launch turned cloud partners into distribution channels

Command R+ launching simultaneously on Cohere’s API, Azure AI Studio, and AWS Bedrock in April 2024 was not a technical accident — it was a distribution strategy. Enterprise buyers who have committed cloud spend on AWS or Azure can access Cohere through existing procurement relationships and consume models against existing cloud credits. This eliminated the “new vendor onboarding” friction that kills many enterprise AI evaluations. The OCI partnership extended this further: Oracle’s enterprise sales force now actively positions Cohere as part of Oracle’s Sovereign AI offering, giving Cohere access to Oracle’s 25,000+ enterprise accounts without a direct Cohere sales motion. Understanding how AI companies use cloud partnerships as pricing levers is key context here.

4. Rerank per-query pricing created a natural upsell layer in the RAG stack

When Cohere launched the Rerank API in 2023, it made a structurally important pricing decision: charge per query, not per token. This meant that every time a developer adds Rerank to an existing Embed + Command RAG pipeline, they add a clean, predictable cost line ($2/1K queries) that maps directly to user interactions. Unlike adding another token-consuming model step (which compounds in complex ways), Rerank’s per-query price is easy to justify in a cost-per-user calculation. This made Rerank adoption a business case conversation rather than an engineering cost discussion — and drove Rerank’s position as the near-universal choice for production RAG pipelines built on Cohere infrastructure.

Areas to improve : gaps in Cohere’s pricing approach

1. Pricing page behind authentication creates a transparency disadvantage

Cohere’s detailed pricing page requires authentication to view — a significant friction point compared to Anthropic and OpenAI, which publish full pricing tables publicly. Developers evaluating APIs before building a proof-of-concept cannot compare Cohere’s token costs without creating an account first. This is a meaningful pricing transparency disadvantage in a market where engineers comparison-shop on pricing pages before deciding which API to prototype with. The fix is straightforward: publish the token rate table publicly. The numbers are already available through third-party sources; the authentication gate costs Cohere evaluation cycles without protecting any commercially sensitive information.

2. No spend-cap configuration or budget alerts creates cost-control anxiety for developer teams

Cohere’s dashboard shows historical usage but does not document a way to set hard spending caps or configure threshold alerts. A developer who accidentally runs a tight loop calling Command A with long system prompts could accumulate hundreds of dollars in charges before noticing. This is a cost predictability gap that is standard in cloud infrastructure (AWS, GCP, Azure all have budget alert systems) but absent in many early-stage API providers. Adding configurable spending caps — even just email alerts at 50%/80%/100% of a monthly budget — would meaningfully reduce developer anxiety and increase API adoption from cost-sensitive teams building with their own credit cards. See usage-based pricing implementation best practices for how to structure these controls.

3. Trial tier rate limits are restrictive enough to impede honest evaluation

Cohere’s trial tier is genuinely free, but the rate limits are low enough that developers cannot run realistic load tests or benchmark Command models against their actual workloads. This forces a premature credit-card commitment before a team has validated that Cohere’s quality-vs-cost profile works for their use case. A more generous trial allocation — for example, $10 in free credits (consumable at full production rate limits) rather than indefinite rate-limited access — would better serve the PLG conversion funnel. Compare this to how effective free tiers work: the goal is to demonstrate value at scale, not just at toy volumes. Cohere’s current trial design shows the API works, but not whether it works for the buyer’s actual workload.

Monetization stack & signals : how Cohere builds & buys its revenue engine

Buys 3 Builds 0 12 open roles

The read — where the monetization investment is going

Cohere buys an enterprise quote-to-cash stack rather than building one, and hires overwhelmingly into go-to-market — vertical and sovereign-region AEs plus a small revenue-accounting layer — while the per-token usage meter behind its own API stays undisclosed.

Stack — build vs buy

Buys (vendor) · 3

Salesforce CRM Job post Jun 2026

“...owning and evolving our full GTM tech stack including Salesforce, Marketo, CPQ, and our ERP integrations.”
CPQ CPQ inferred Job post Jun 2026

“...owning and evolving our full GTM tech stack including Salesforce, Marketo, CPQ, and our ERP integrations.”
NetSuite Revenue recognition Job post 1 Job post 2 Jun 2026

“Design, build, and maintain integrations across the stack including our NetSuite ↔ Salesforce order-to-cash integration.”

Unconfirmed · 1

Token usage metering Metering inferred Docs Jun 2026

“Usage of production keys is metered at the price points listed on the pricing page — a metering layer is operationally implied by the per-token API, but Cohere has disclosed no vendor or in-house build.”

Open roles in the revenue & lifecycle org — 12

View open roles

Senior Account Executive - Life Sciences Customer success Jun 16, 2026
VP Sales, EMEA Customer success Jun 16, 2026
Account Executive Customer success Jun 16, 2026
Senior Account Executive - US Public Sector Customer success Jun 16, 2026
Product Manager, Native Experience & Growth RetentionGrowth Jun 16, 2026
Customer Success Manager (Montreal - French Speaking) Retention Jun 16, 2026
Product Manager, Integrations Retention Jun 16, 2026
RevOps GTM Systems Architect RevOps May 31, 2026
RevOps Analyst (Analytics) RevOpsDeal desk May 31, 2026
RevOps - Compensation & GTM Integration RevOps May 31, 2026
Technical Revenue Manager Deal desk seen Apr 13, 2026
Senior Revenue Accountant Monetization seen Jan 29, 2026
+21 more matched roles

Signals reviewed Jun 2026 · derived from public job posts, product docs

Job postings fill and close over time — once a posting is filled we keep it as a dated citation (the quoted evidence remains); use View open roles for current listings.

Key takeaways

Private deployment is the only enterprise AI pricing moat that compliance-sensitive buyers cannot work around. Cohere’s decision to make in-VPC and on-premises deployment a first-class product — not an enterprise exception — gave it access to regulated-industry deals that API-only competitors structurally cannot win. For AI platform companies, understanding the regulatory procurement environment of your target buyer is as important as model quality.
Per-query billing on Rerank is more intuitive for application developers than per-token billing. By pricing Rerank at $2 per 1,000 queries rather than per token, Cohere made cost-per-user-interaction calculation trivial. Application-layer billing units — queries, requests, events — convert more naturally into product metrics than raw compute units. Teams designing usage-based billing for AI products should ask whether a per-event price is more intuitive than per-token for their specific product action.
Two-tier capability-vs-efficiency pricing enables land-and-expand without feature lock-in. Command R’s low cost enables developer adoption; Command A’s higher capability captures enterprise upgrade budgets — and both use identical API interfaces. This structural escalation without feature lock-in is cleaner than the common tiered SaaS pattern where moving up a tier requires renegotiating features, not just changing a model parameter.
Multi-cloud day-one launch turns cloud partners into distribution channels, not just infrastructure vendors. Cohere’s simultaneous launch on AWS, Azure, and GCP means enterprises with existing cloud commitments can access Cohere models against pre-committed spend — eliminating new vendor procurement friction. AI companies with genuine enterprise distribution ambitions should treat cloud marketplace presence as a pricing and GTM decision, not just a technical one.
A dedicated low-cost SKU (Command R7B: $0.0375/1M input, $0.15/1M output) creates a specialized tier for high-throughput workloads. By pricing R7B an order of magnitude below the efficiency-tier Command R, Cohere gives cost-sensitive, high-volume use cases — classification, extraction, intent detection — their own purpose-built model. This shows that billing unit design and SKU segmentation can target specific use-case profiles, not just reflect raw compute costs.

UBP implications

Per-query billing for search/ranking operations is the right unit for retrieval-layer APIs. Cohere’s Rerank API demonstrates that the billing unit should map to the application action, not the underlying compute primitive. As AI products add retrieval, reranking, and tool-call layers, per-event pricing at each layer is more composable and forecastable than attempting to roll all costs into a single per-token rate. Designing value metrics for multi-layer AI pipelines requires choosing the right unit at each layer independently.
Private-deployment licensing is a distinct pricing model from API-usage pricing — and requires different contract mechanics. Cohere’s enterprise private-deployment contracts involve committed usage volumes, annual terms, and infrastructure-cost sharing — none of which map cleanly to standard API pay-as-you-go invoicing. For AI platform companies, building a second pricing motion (software license for self-hosted) alongside the primary API motion requires separate billing systems, separate sales motions, and separate unit economics. Cohere’s operational complexity is meaningfully higher than pure-API peers as a result.
Two-tier model pricing (efficiency vs capability) is the emerging standard for enterprise AI APIs, but the expansion mechanic only works if both tiers share identical API interfaces. The Command R → Command A expansion motion works because changing the model parameter is the only required code change. If the capability tier had different API endpoints, authentication, or feature sets, the expansion would require re-integration work — killing the land-and-expand motion. When designing tiered AI API pricing, preserving interface compatibility across tiers is a prerequisite for usage-based expansion revenue.

Sources

LiteLLM model pricing database — Cohere models (accessed 2026-05-30)
Cohere pricing page (accessed 2026-06-05, re-verified first-party) — confirms Command R+ 08-2024 at $2.50/$10 per 1M (the capability-tier rate shared with Command A); still shows only legacy-model rates for existing customers plus Model Vault per-instance pricing, with no public current per-token rate for Command A, Command R 08-2024, Command R7B, Embed v4 API, or Rerank per-query
Cohere — How does Cohere’s pricing work? (accessed 2026-05-30)
Cohere Python SDK README — platform and model references (accessed 2026-05-29)
Vercel AI Gateway — Cohere model pricing (accessed 2026-05-30)
AWS Bedrock — Cohere model pricing (bedrock/cohere.* entries) (accessed 2026-05-30)
Azure AI — Cohere Rerank v3.5/v4.0 and Embed pricing (accessed 2026-05-30)

Bottom line

Cohere has built the most enterprise-defensible pricing architecture in the frontier AI market: a pure usage-based public API with no subscription floor, layered on top of a private-deployment licensing model that no data-sovereignty-constrained buyer can substitute with a competitor’s cloud API. Command A at $2.50/$10 per million tokens with 256K context is competitive with Anthropic and OpenAI for agentic enterprise workloads; Command R at $0.15/$0.60 is among the cheapest capable RAG models available through a managed API. The Rerank per-query billing model and the Embed v4.0 multimodal embeddings complete a vertically integrated RAG stack that Cohere can sell as a unit — making its pricing architecture a genuine competitive moat rather than just a commodity token rate sheet.

Compare Cohere’s pricing architecture against other enterprise AI platform providers in the pricing blueprint — including the Perplexity AI blueprint for a contrasting consumer-first freemium model.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Embed v4.0 and Rerank v4.0 Multimodal Launch

Jun 2025

Embed v4.0 launched at $0.12/1M tokens with multimodal support and 128K context window for long-document retrieval. Rerank v4.0 Pro and Fast variants launched on Azure AI at $0.0025 and $0.002 per query respectively.

captured 2025-06-01

Command A Launched — 256K Context Agentic Model

Feb 2025

Command A (command-a-03-2025) launched with 256K context window, agentic task optimization, and $2.50/$10 per million token pricing. Positioned as the successor to Command R+ for enterprise agentic workflows.

Command R/R+ Refresh and R7B Added

Aug 2024

Command R and Command R+ refreshed (08-2024 versions) with improved performance at identical pricing. Command R7B launched in December 2024 at $0.0375/$0.15 per million tokens — lowest cost in the Command line.

Series D ($500M) at $5B Valuation

Jul 2024

Cohere raised $500M Series D at a $5B valuation from Nvidia, Salesforce, Oracle, Fujitsu, and others. Announced expanded private-deployment partnerships with all major cloud providers.

Command R+ Launched on Azure and AWS

Apr 2024

Command R+ launched at $2.50/$10 per million tokens with 128K context. Made available on Azure AI Studio and AWS Bedrock simultaneously — first Cohere model with multi-cloud day-one launch.

Command R Launched at $0.15/1M Input

Mar 2024

Command R launched with 128K context window at $0.15/$0.60 per million input/output tokens — the first Cohere model priced below $1/1M input, targeting cost-sensitive RAG deployments.

Series C ($270M) — Rerank API Launch

Jun 2023

Cohere raised $270M in a combined Series C from Inovia Capital, NVIDIA, Salesforce Ventures, and others. Launched Rerank API with per-query pricing at $2 per 1,000 queries. Command model family formalized.

Series B ($125M) at $2.1B Valuation

Nov 2022

Cohere raised a $125M Series B at a $2.1B valuation. Expanded API to include Classify, Summarize, and Detect Language endpoints alongside Generate and Embed.

Public API Launch with Free Trial

Nov 2021

Cohere launched its first public API for text generation and embedding, offering a free trial tier with rate limits and pay-as-you-go production pricing.

Cohere Founded

Oct 2019

Cohere founded by Aidan Gomez, Ivan Zhang, and Nick Frosst. Initial focus on NLP APIs for enterprise text understanding.

Trivia

· Cohere was co-founded by Aidan Gomez, who was a co-author of the seminal 'Attention Is All You Need' paper that introduced the transformer architecture — the technology underpinning virtually every major LLM today.
· Cohere's North Star is private deployment: unlike OpenAI and Anthropic, Cohere actively champions running models on-premises or in a customer's own VPC, making it the only major frontier AI company to treat cloud-API access as secondary to enterprise ownership.
· Command R7B (December 2024) is the lowest-cost model in the Command family at $0.0375/1M input and $0.15/1M output — under four cents per million input tokens, making it one of the cheapest production LLM APIs available anywhere.

Questions & answers

How much does Cohere's Command API cost?: Command A costs $2.50 per million input tokens and $10 per million output tokens with a 256K context window. Command R costs $0.15 per million input tokens and $0.60 per million output tokens with a 128K context window. Command R7B costs $0.0375 per million input and $0.15 per million output — the lowest-cost option in the Command family.
Does Cohere have a free tier?: Yes. Cohere offers a free trial API tier with rate-limited access to all endpoints — no credit card required. Trial accounts have lower requests-per-minute limits but can access Command, Embed, and Rerank APIs for testing and prototyping.
How does Cohere Rerank pricing work?: Cohere Rerank is billed at $2 per 1,000 queries (i.e., $0.002 per rerank call), regardless of how many documents are in the candidate set. This per-query billing maps directly to application events rather than raw token consumption, making cost forecasting straightforward for RAG pipelines.
Can I run Cohere models privately on my own cloud?: Yes. Cohere's defining differentiator is private deployment — models can be deployed on AWS Bedrock, Azure AI, Google Vertex AI, Oracle Cloud Infrastructure, or on-premises. Enterprise private-deployment contracts are sales-led and custom-priced based on usage commitments.
How does Cohere's Embed v4.0 pricing compare to v3?: Embed v4.0 costs $0.12 per million tokens — slightly more than Embed v3 at $0.10 per million tokens — but adds multimodal support (images and text), a 128K context window for long-document embedding, and improved retrieval accuracy for enterprise search use cases.
What is Cohere's enterprise pricing model?: Enterprise pricing for private deployment is custom and sales-negotiated. Cohere offers dedicated models in a customer's VPC or on-premises, with pricing tied to usage volume commitments. Public API usage is pay-as-you-go with no seat or subscription fees; enterprise contracts typically include committed-use discounts.