All companies
technology

Cartesia pricing

cartesia.ai facts checked analysis reviewed
Quick summary
Region
Product
Real-time voice AI platform (Sonic TTS, voice cloning, voice agents)
Industry
technology
Commits
Available (annual)
In this page
AI Summary
  • Cartesia operates a credit-based freemium subscription model with four self-serve tiers — Free (20K credits/mo), Pro ($4/mo for 100K credits), Startup ($39/mo for 1.25M credits), and Scale ($239/mo for 8M credits, fully self-serve, not sales-gated) — plus a custom Enterprise tier with volume commitments. Every plan ships unlimited workspace seats and voice slots.
  • Credits price by seconds of generated audio on the flagship Sonic-3.5 text-to-speech model, with separate consumption rates for premium features — the voice changer runs 15 credits per second and localizing a voice is a 225-credit one-time cost — so buyers cannot forecast a bill without modelling their feature mix, not just their volume.
  • Voice Agents (the Line product, built on Ink-2 speech-to-text) are billed at a flat $0.06 per minute of call duration on every plan, plus $0.014 per minute of telephony when using a Cartesia-provided phone number. Agents draw down a prepaid dollar balance ($1/mo on Free up to $299/mo on Scale), not bundled minutes.
  • Cartesia's subscription prices sit deliberately below ElevenLabs at the prosumer end — Pro is $4/mo versus ElevenLabs' $5 Starter — while the $239 Scale tier (8M credits) targets production teams without forcing an annual sales contract, a self-serve ceiling most voice-AI rivals gate behind sales.
  • Voice cloning has a two-tier structure: instant voice cloning (included from the $4 Pro tier) versus professional/pro voice cloning (introduced at the $39 Startup tier). Localizing a voice is billed as a 225-credit one-time cost rather than a flat dollar setup fee.
  • Cartesia advertises on-premise and virtual-private-cloud deployment of its Sonic and Ink models for Enterprise customers — a rare offering in voice AI driven by call-center and healthcare prospects with strict data-residency requirements that ElevenLabs and OpenAI Voice do not meet.
Pricing summary
Cartesia 2026 — Voice AI pricing overview
Free → Pro $4/mo → Startup $39/mo → Scale $239/mo → Enterprise (custom)
Free
Free
Developers evaluating Sonic and Ink; hobby projects
Enterprise
Custom
Regulated industries; on-prem / VPC deployments
All four self-serve tiers (Free → Scale) are credit-metered for TTS/STT. Voice Agents bill at a flat $0.06/min of call duration on every plan, plus $0.014/min telephony for a Cartesia-provided number, drawn from a prepaid dollar balance. Enterprise uses custom credits and volume pricing.

About

Cartesia is a San Francisco-based voice AI startup founded in 2024 by Karan Goel and Albert Gu, the latter best known as co-author of the Mamba state-space model paper. The company’s core bet is that state-space models (SSMs) beat transformer architectures for streaming, low-latency audio generation — a thesis embodied in their flagship Sonic text-to-speech model, which advertises sub-90ms model latency for real-time conversational use cases.

Cartesia raised a $27M seed round led by Index Ventures in March 2024, followed by a $64M Series A in March 2025 also led by Index. Lightspeed Venture Partners, Conviction, and a roster of AI researchers participated. The company is private and pre-revenue-disclosure, with industry estimates putting late-2025 ARR somewhere below $50M — small relative to ElevenLabs but growing fast on the back of the voice-agents wave.

The product surface has three pillars: the Sonic TTS API (per-character or per-credit usage), instant and professional voice cloning, and a packaged Voice Agents product that bundles STT, TTS, and turn-detection into a per-minute conversational unit. Cartesia competes directly with ElevenLabs at the top of the market, PlayHT and Resemble AI in the prosumer tier, and Vapi/Retell on the voice-agents side. Their differentiator is latency plus on-premise availability — two attributes that matter to healthcare, financial-services, and regulated enterprise prospects whose SaaS-only competitors cannot serve.


Pricing summary : How Cartesia’s credit-based freemium stack works

Cartesia runs a credit-based freemium subscription for its core TTS and STT products, with a separate flat per-minute price for Voice Agents drawn from a prepaid dollar balance. The four self-serve tiers (Free, Pro, Startup, Scale) are flat monthly subscriptions that bundle a fixed credit allowance plus a prepaid agent balance, and every plan includes unlimited workspace seats and voice slots. Above Scale, the Enterprise tier converts to custom credit and agent volumes with negotiated volume pricing.

The “credit” is a deliberately abstracted unit — credits are spent per second of generated audio on the flagship Sonic-3.5 text-to-speech model, with premium features priced separately (the voice changer is 15 credits per second of audio; localizing a voice is a one-time 225-credit cost). This is similar to how credit-based AI pricing models trade transparency for flexibility: Cartesia can tune per-feature credit rates without renegotiating contract pricing, but customers cannot easily forecast bills without modelling their feature mix.

Voice Agents (the Line product) are billed separately at a flat $0.06 per minute of call duration on every plan and are NOT priced in credits — a deliberate packaging choice that simplifies forecasting for conversational use cases at the cost of obscuring which component (Ink-2 STT, Sonic-3.5 TTS, or turn-detection) drives the unit cost. This dual-axis model — credits for batch TTS, dollars-per-minute for real-time agents — mirrors the broader shift from per-user licensing to usage-based AI billing playing out across the category.

What makes this different: Cartesia is one of the few voice AI vendors advertising on-premise Sonic and Ink deployment in its Enterprise tier. ElevenLabs, OpenAI Voice, and PlayHT are all SaaS-only. That single capability — the models running inside a hospital’s or bank’s network — is the wedge that justifies enterprise pricing above the self-serve ceiling.


Pricing by product

Sonic-3.5 TTS / Ink-2 STT (self-serve credit tiers)

TierPriceIncludedKey mechanics
Free$020,000 credits/mo (~27 TTS min); $1/mo prepaid agents; 2 concurrent requestsNo credit card; unlimited seats; Sonic-3.5 + Ink-2
Pro$4/mo100,000 credits/mo (~133 TTS min); $5/mo prepaid agents; commercial licenseAdds instant voice cloning; 3 concurrent requests
Startup$39/mo1.25M credits/mo (~1,667 TTS min); $49/mo prepaid agents; organizationsAdds pro voice cloning; 5 concurrent requests
Scale$239/mo8M credits/mo (~10,667 TTS min); $299/mo prepaid agents; priority supportSelf-serve ceiling; high concurrency limits

Enterprise (custom)

TierPriceIncludedKey mechanics
EnterpriseCustomCustom credits & agent usage; on-prem / VPC; DPAs, BAAs, SSOVolume pricing; custom concurrency limits; security reviews

Voice Agents — Line (flat per-minute, all plans)

ComponentRateNotes
Call duration$0.06/minSame flat rate on Free, Pro, Startup, and Scale
Telephony$0.014/minOnly when using a Cartesia-provided phone number
Agent fundingPrepaid dollar balance$1/mo (Free) → $5 (Pro) → $49 (Startup) → $299 (Scale)
LLM usage during callsFree (UI-created agents)Free for a limited time per Cartesia

Voice cloning & feature credits

ServiceCostNotes
Instant voice cloningIncludedSelf-serve from the $4 Pro tier
Professional (pro) voice cloningIncludedIntroduced at the $39 Startup tier
Voice changer15 credits / second of audioDrawn from the plan credit allowance
Localizing a voice225 credits (one-time)Per-voice localization cost

Sales motions across products: self-serve PLG for Free, Pro, Startup, and Scale (all four are dashboard-purchasable, including Voice Agents); sales-led only for the custom Enterprise tier. All prices accessed 2026-05-30; Voice Agent call duration is a flat $0.06/min on every plan.


Hidden costs : What Cartesia users actually pay beyond the headline credit allowance

Archetype A: Solo developer building a voice-cloned podcast workflow (Startup plan)

A solo creator generating ~5 hours of narration per month using instant voice cloning:

Line itemMonthly cost
Startup subscription ($39/mo)$39.00
Base allowance: 1.25M credits (~1,667 TTS min)included
~5 hrs (300 min) of TTS — well within ~1,667-min allowancewithin allowance
Localizing one voice into a second language (225 credits, one-time)within allowance
Estimated total~$39

For a pure-TTS creator, the Startup plan’s ~1,667 included minutes are hard to exhaust at podcast volumes, so the headline $39 is usually the whole bill. The forecasting trap is feature credits, not volume: the voice changer alone burns 15 credits per second of audio, so heavy use of premium features — not raw narration length — is what pushes a user toward overage. This is the most common form of AI cost unpredictability on credit-based platforms.

Archetype B: Mid-market customer-service team running voice agents (Scale plan)

A 50-agent contact-center deployment, average 4 hours/day of conversational voice across the team:

Line itemMonthly cost
Scale subscription ($239/mo)$239
Voice-agent minutes: 50 agents × 4 hrs × 20 days = 240,000 min × $0.06$14,400
Telephony (Cartesia numbers): 240,000 min × $0.014$3,360
TTS credits for IVR + scripted prompts (within 8M Scale allowance)included
Estimated total (approximately)~$18,000

At 240k minutes, the $239 base subscription is rounding error. Real cost sits in the flat $0.06/min agent rate plus $0.014/min telephony, both drawn from the prepaid agent balance. Because the per-minute rate is identical on every plan, the only lever at this volume is the custom Enterprise tier, where Cartesia advertises volume pricing on credits and agent usage.

Want to estimate your own Cartesia bill? Use the Cartesia pricing calculator to model costs across voice cloning, TTS character volume, and agent-minute consumption.


Pricing evolution : Cartesia’s journey from pure-usage API to credit-based SaaS

Cadence

QuarterPrice changesProduct / SKU additionsNotes
2024 Q101Company launch + Sonic announced; $27M seed
2024 Q201Sonic API GA; pure usage at ~$65/1M chars
2024 Q312Tiered subscriptions introduced: Free, Creator ($5), Pro ($49)
2025 Q112Sonic-2 launch; clone multiplier formalized; Business ($299) added
2025 Q101Voice Agents beta announced alongside Series A
2025 Q302Scale + Enterprise tiers formalized; on-prem SKU added
2026 Q111Voice Agents GA; per-minute rates published; 1k included minutes added to Business

Tracked range: 2024 Q1–2026 Q2. Quarters not listed verified stable (0 changes, 0 additions). Sonic-2 voice-clone multiplier (2×) is technically a price change disguised as a product launch.

Notable changes

  • 2024-09-30 — Moved from pure-usage API ($65/1M chars) to a freemium subscription stack. First major pricing-architecture decision, made roughly six months post-launch.
  • 2025-01-22 — Sonic-2 launch silently introduced the 2× credit multiplier for cloned voices. No public announcement; documented in API reference only. This is the kind of packaging change that benefits the vendor and surprises customers.
  • 2025-03-25 — Voice Agents launched as a separate per-minute SKU rather than rolled into the credit system — a packaging decision that has held since.
  • 2025-09-15 — Scale tier and on-prem Enterprise SKU formalized. This was the first explicit sales-led motion at Cartesia and signaled enterprise readiness.
  • 2026-02-10 — Voice Agents GA with published per-minute pricing ($0.12-0.15) and 1,000 bundled minutes on Business. This is the most recent pricing inflection.

What’s unique : Cartesia’s distinctive pricing mechanics

1. Credits as a per-model multiplier abstraction. Cartesia’s credit unit lets the company adjust per-model costs (2× for clones, 3× for Sonic Turbo) without changing the headline subscription price. This protects margin on premium models but creates aggregation complexity for customers who mix voices. ElevenLabs uses a simpler character-only model; PlayHT uses words. Cartesia’s credit abstraction is closer to OpenAI’s old “tokens” model in spirit — flexible for the vendor, harder to forecast for the buyer.

2. Dual-axis billing: credits for TTS, minutes for Voice Agents. Most voice AI vendors bill everything in the same unit (characters or minutes). Cartesia uses credits for batch TTS and minutes for agents — recognizing that a 2-minute customer-service conversation is operationally different from a 500-character notification. This composite billing approach aligns price to use case but doubles the forecasting work for buyers who do both.

3. On-premise as the enterprise wedge. Cartesia is the only major voice AI vendor offering on-prem Sonic deployment for healthcare and financial-services prospects. ElevenLabs is SaaS-only. OpenAI Voice is SaaS-only. PlayHT is SaaS-only. This single capability supports a separate Enterprise SKU with custom pricing — converting a regulatory constraint into a revenue line. See Deepgram’s blueprint for a comparable speech-side play.

4. Instant Voice Clone bundled into prosumer tier ($5/mo). Most competitors gate voice cloning to mid-tier ($22/mo at ElevenLabs Starter; sales-gated at PlayHT). Cartesia bundles IVC into the $5 Pro tier with five clones included. This is a deliberate prosumer-segment play, betting that creators who clone voices at $5 will upgrade to Startup ($49) when they scale — a textbook PLG monetization funnel.

5. Series A pricing power locked in fast. The $64M Series A in March 2025 (12 months after seed) capitalized Cartesia before competitors could undercut on price. Most voice AI startups still target $40-60 per 1M characters; Cartesia’s effective rate on the Business tier (8M credits / $299) works out to ~$37/1M characters before clone multipliers — competitive without being self-destructive. The fast follow-on removed pressure to slash prices for runway.


Strengths & weaknesses

StrengthsWeaknesses
Generous free tier (10k credits) builds developer goodwill without credit card frictionCredit multipliers (2× clones, 3× Turbo) are documented but easy to miss — predictable cause of bill shock
Sub-90ms latency is genuinely category-leading and supports real-time use casesVoice Agents per-minute pricing is estimated/varies — less transparent than character-based TTS
On-prem Enterprise SKU is rare in voice AI and supports HIPAA/SOC 2 use cases competitors cannot serveSales-gated Professional Voice Clone setup fee not published — opacity that ElevenLabs has moved away from
Dual-axis pricing (credits + minutes) maps to actual use cases (batch vs conversational)No published volume pricing tiers — Scale and Enterprise require sales calls even for predictable workloads
Annual commit discounts available through Scale tier give cost-sensitive customers a pathNo published education or nonprofit discount tier; missing a key prosumer wedge ElevenLabs uses well
Startup ($49) tier delivers ~1.25M credits — competitive against ElevenLabs Creator ($22 for 100k chars) at scaleCredit-unit obfuscation makes apples-to-apples comparisons against character-priced competitors hard for buyers

Billing UX : Cartesia’s account controls and developer console

  • Self-serve upgrades — Plan changes for Free → Pro → Startup → Business are handled entirely in the dashboard at play.cartesia.ai. No sales call required up to Business.
  • Annual billing — Annual plans offered with approximately 15-20% discount on self-serve tiers; Scale and Enterprise are annual-only.
  • Credit overage — Overage credits sold in prepaid packs (typical: 100k credits for ~$4, 1M for ~$35). Auto-renewal of overage packs is an opt-in setting.
  • Usage dashboard — Real-time credit consumption visible in the developer console, broken out by voice/model. Voice-agent minutes shown in a separate panel.
  • Spend caps and alerts — Configurable monthly spend cap on overage; email alerts at 50%, 80%, 100% of allowance. This is more graduated than most competitors (ElevenLabs only alerts at 100%).
  • Payment methods — Credit card and ACH/wire for Scale and Enterprise. PayPal/Apple Pay not supported as of late-stage 2026.
  • Receipts and invoicing — Stripe-powered receipts for self-serve; NetSuite-routed invoices for Scale/Enterprise with NET 30 terms typical.
  • Cancellation — In-app cancellation for self-serve tiers; annual commit cancellation requires written notice 30+ days before renewal.
  • API key management — Multiple API keys per workspace with scoped credentials; per-key usage tracking — useful for aggregating consumption across teams.

Strategic wins : Why Cartesia’s pricing decisions worked

1. Free tier with no credit card removed friction at the top of the funnel

10,000 free credits monthly — enough to actually evaluate Sonic in a real prototype — combined with no credit card requirement created one of the most frictionless voice AI onboarding flows in the category. This is the same playbook ElevenLabs used in 2023 (and has since tightened); Cartesia is running it harder. The PLG-style free-tier play is the dominant reason Cartesia shows up on indie-developer benchmark posts more often than its market share would suggest.

2. $5 Pro tier captured prosumer segment competitors ignored

Pricing Pro at $5/month, below the typical prosumer floor of $20, opened a segment ElevenLabs and PlayHT had effectively abandoned. The bet was that $5/month creators eventually upgrade to $49 Startup as their projects grow — a classic tiered monetization ladder. Even if the conversion rate is low, the LTV math works because $5 tier is profitable at moderate volumes given the credit allowance.

3. On-prem Enterprise SKU unlocked regulated-industry revenue

Cartesia’s decision to offer on-premise Sonic deployment for HIPAA-regulated healthcare prospects converted a technical capability (model portability via SSM efficiency) into a revenue line. ElevenLabs, OpenAI Voice, and PlayHT cannot serve these customers at all. This is a textbook example of turning compliance constraints into pricing power — the contracts are large, sticky, and effectively uncontested.

4. Series A timing locked in pricing before commoditization pressure

Raising $64M just 12 months after seed gave Cartesia runway to hold pricing while transformer-based TTS commoditized. Most competitors are now forced into per-character price wars; Cartesia can hold the line on credit pricing and let cheaper alternatives compete for the prosumer floor. The funding-pricing relationship is rarely discussed but consistently determines who survives the next pricing reset.


Areas to improve : Gaps in Cartesia’s pricing approach

1. Credit multiplier opacity creates predictable bill shock

The 2× credit cost for cloned voices and 3× for Sonic Turbo is documented but buried in API reference docs. A user planning around the headline “8M credits” on Business who switches to cloned voices mid-month will hit overage at 4M characters, not 8M. The fix: surface the effective character-equivalent allowance for each plan and add a real-time projection in the usage dashboard. See bill shock patterns in AI billing for why this matters more than it looks.

2. Voice Agents per-minute pricing not published with full transparency

The $0.10-$0.15 per-minute range varies by model and turn-detection settings, but Cartesia does not publish a clear matrix. Buyers comparing against Vapi ($0.08-0.10/min published) or Retell ($0.07/min published) cannot do apples-to-apples math without a sales call. The fix: publish a per-model per-minute rate card. Sales-led conversions don’t require pre-call opacity to convert.

3. No published volume tiers between Business and Scale

The gap between Business ($299/mo, 8M credits) and Scale (sales-led, custom) is unbridged for customers consuming 30-100M credits/month who do not want to commit annually. ElevenLabs offers a $1,320/month Enterprise-Lite tier serving exactly this band. The fix: introduce a $999-$1,500 published tier serving the “more than Business, not ready for annual commit” segment — a missing rung in the pricing ladder.

4. Professional Voice Clone setup fee opaque

The $1,000-$3,000 PVC setup fee is sales-gated with no public starting price. ElevenLabs publishes its PVC at $99 (Creator tier) and includes it transparently. Cartesia’s approach makes PVC feel like an enterprise sale even for individual creators willing to pay, leaving voice-artist revenue on the table.

5. No education or nonprofit pricing

Most AI infrastructure vendors offer 50% education or nonprofit discounts as a brand-equity and acquisition play. Cartesia has neither published. The fix is mechanical: a SheerID-verified education tier at $25/month (50% off Startup) would capture student researchers and university labs at near-zero cost to Cartesia.


Key takeaways

  1. Credit-based billing trades transparency for vendor flexibility — and customers notice. Cartesia’s per-model credit multipliers (2× clones, 3× Turbo) let the company tune margins without contract renegotiation, but they create forecasting friction for buyers. Pricing teams should weigh the operational flexibility of opaque units against the trust cost paid when customers hit unexpected overages.

  2. Dual-axis billing (credits + minutes) correctly recognizes that batch and real-time AI workloads are economically different. Cartesia bills TTS by credits and Voice Agents by minutes because the cost drivers differ — GPU inference time for streaming agents dominates over raw token output. As AI products incorporate more real-time conversational surfaces, this dual-axis pattern will become standard.

  3. On-premise deployment is the enterprise moat in voice AI. Cartesia’s on-prem Enterprise SKU serves customers that SaaS-only competitors literally cannot. For any AI infrastructure vendor whose model can run efficiently on customer hardware, a premium on-prem tier is one of the highest-margin revenue lines available — and a defensible position against commoditization.

  4. Free tiers without credit cards drive disproportionate developer mindshare. Cartesia’s no-CC, 10k credit free tier is a small giveaway with outsized funnel impact: indie developers benchmark Cartesia first because it costs nothing to try. The conversion cost per qualified evaluator is roughly two orders of magnitude lower than paid marketing for AI infrastructure.

  5. Pricing-ladder gaps between $299 and “call sales” are where competitors slot in. Cartesia’s missing $1,000-$1,500 published tier is a textbook gap-creating mistake — buyers who outgrow Business but aren’t ready for annual commit will look elsewhere. Every pricing ladder needs a published rung roughly every 3-5× in volume; gaps invite displacement.


UBP implications

  1. Composite billing units (credits, minutes) abstract pricing risk to the vendor. Cartesia’s credit unit lets the company shift per-model economics without contract changes — a powerful UBP design when the underlying cost structure evolves (new models, new hardware). Pricing teams building usage-based products should evaluate whether a composite unit serves their margin protection better than a transparent single-dimension unit.

  2. Real-time conversational AI requires per-minute (not per-token) pricing to align with cost. Voice Agents bill by minute because GPU inference time, network round-trips, and turn-detection compute dominate the cost stack — not raw output tokens. As AI products add real-time surfaces (voice, video, interactive agents), pricing teams should expect to move from per-token to per-second/per-minute billing for these workloads.

  3. On-prem and VPC deployment is a defensible UBP wedge for regulated industries. Cartesia’s Enterprise tier proves that customers in healthcare, financial services, and government will pay a multi-X premium for on-prem AI that meets data-residency requirements. For any AI infrastructure provider whose model is portable, an on-prem SKU at 2-5× SaaS pricing represents one of the cleanest enterprise UBP plays available.


Sources


Bottom line

Cartesia has built a credit-based freemium stack that competes effectively with ElevenLabs on price at the prosumer tier and with Deepgram on capability at the regulated-enterprise tier. The pricing architecture is clever — credits abstract model differences, dual-axis billing aligns to use case, and on-prem deployment unlocks regulated industries that SaaS-only competitors cannot serve. But the same credit abstraction that protects margin creates forecasting friction; the missing tier between $299 and “call sales” leaves a gap competitors will fill; and the sales-gated PVC setup fee feels archaic in a category where ElevenLabs has gone fully transparent. The bet on state-space architecture for latency-critical voice is structurally sound — and the pricing power of a fast-follow Series A gives Cartesia room to hold positioning while transformers commoditize underneath them.

Browse the full pricing blueprint to compare Cartesia against ElevenLabs, Deepgram, and other voice AI vendors.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Voice Agents GA + Flat Per-Minute Pricing

Voice Agents (Line) moved to general availability with a flat $0.06 per minute of call duration on every plan, plus $0.014 per minute of telephony when using a Cartesia-provided phone number. Agent usage is funded from a prepaid dollar balance ($1/mo on Free up to $299/mo on Scale) rather than a bundle of included minutes.

Voice Agents GA + Flat Per-Minute Pricing screenshot 1
Voice Agents GA + Flat Per-Minute Pricing screenshot 2
Voice Agents GA + Flat Per-Minute Pricing screenshot 3

Scale Tier + Enterprise Deployment Introduced

An 8M-credit Scale tier was positioned as the self-serve ceiling with priority support and high concurrency limits, while a custom Enterprise SKU added on-premise / VPC deployment of the Sonic and Ink models with DPAs, BAAs, and SSO for compliance-driven prospects.

Series A ($64M) + Voice Agents Beta

Index Ventures led a $64M Series A. Cartesia simultaneously announced Voice Agents — a packaged STT+TTS+turn-detection product priced per minute rather than per character — positioning against Vapi and Retell.

Next-generation Sonic + Pricing Refresh

Cartesia shipped a next-generation Sonic release with improved naturalness and broad multilingual support, and rebalanced credit consumption so that premium features (voice changer, voice localization) draw additional credits beyond base synthesis. An 8M-credit production tier was added for higher-volume teams.

Tiered Subscription Plans Introduced

Cartesia restructured from pure usage to a freemium subscription model with three tiers: Free (10k credits), Creator ($5/mo for 100k credits), and Pro ($49/mo for 1.25M credits). Credit-pack overages preserved for power users. This was the first move toward a SaaS-style commitment model.

Sonic API General Availability

Sonic TTS API opened to the public with a free tier of 10,000 monthly credits and a developer-focused, pay-as-you-go usage model. No subscription tiers yet — purely metered API with prepaid credit packs. Targeted at developers integrating real-time voice into apps.

Company Launch + $27M Seed Round

Cartesia emerged from stealth with a $27M seed round led by Index Ventures. Founders Karan Goel and Albert Gu announced Sonic, the first commercial state-space TTS model. Initial product was API-only with usage-based per-character pricing at approximately $65 per 1M characters — undercutting ElevenLabs by roughly 30%.

Trivia
  • · Cartesia was founded in 2024 by Karan Goel and Albert Gu — the same Albert Gu who co-authored the Mamba state-space model paper at CMU. Cartesia's Sonic model is a direct commercial application of state-space architecture, betting that SSMs beat transformers for real-time streaming audio.
  • · Sonic was the first commercial TTS model to advertise sub-90ms model latency — roughly 3-5× faster than ElevenLabs Turbo at launch. That latency number is itself a marketing artifact: it measures only the model, not the network round-trip a developer actually pays for.
  • · Cartesia raised a $27M seed in March 2024 led by Index Ventures, then a $64M Series A in March 2025 also led by Index — an unusually fast follow-on that locked in pricing power before competitors could undercut. Lightspeed, Conviction, and a roster of AI researchers participated.

Questions & answers

How much does Cartesia cost per month?
Cartesia's self-serve plans are Free (20,000 credits/month), Pro at $4/month (100,000 credits), Startup at $39/month (1.25 million credits), and Scale at $239/month (8 million credits). Scale is fully self-serve — not a sales-gated tier. Above Scale, Enterprise uses custom credit and agent volumes with volume pricing negotiated with sales.
What is a Cartesia credit and how does it convert to audio?
Credits are consumed per second of generated audio on the flagship Sonic-3.5 text-to-speech model, and the included minutes scale with the plan: roughly 27 TTS minutes/month on Free, ~133 on Pro, ~1,667 on Startup, and ~10,667 on Scale. Premium features cost extra credits — the voice changer is 15 credits per second of audio, and localizing a voice is a one-time 225-credit cost.
Does Cartesia have a free tier and is a credit card required?
Yes. The free tier provides 20,000 credits per month with no credit card required at signup, plus a $1/month prepaid balance for voice agents. Free-tier users get the Sonic-3.5 text-to-speech and Ink-2 speech-to-text models, unlimited workspace seats, and one voice agent slot.
How is Cartesia's voice-agent pricing different from per-second TTS?
Cartesia's Line voice agents are billed at a flat $0.06 per minute of call duration on every plan — Free, Pro, Startup, and Scale all pay the same rate — plus $0.014 per minute of telephony when you use a Cartesia-provided phone number. Agents draw from a prepaid dollar balance ($1/mo on Free up to $299/mo on Scale) rather than a bundle of included minutes.
What is the difference between instant and professional voice cloning?
Instant voice cloning is included from the $4 Pro tier and produces a usable clone from a short reference clip. Professional (pro) voice cloning is introduced at the $39 Startup tier. Localizing a voice into another language is billed as a one-time 225-credit cost rather than a flat dollar setup fee.
Does Cartesia offer on-premise deployment?
Yes, through the Enterprise tier. Cartesia is one of the few voice AI vendors that advertises on-premise or virtual-private-cloud deployment of its Sonic and Ink models, targeting healthcare (BAAs), financial services, and compliance-driven customers with strict data-residency requirements. Enterprise pricing is custom and adds DPAs/BAAs, SSO, and security reviews.