AI Summary
About
Cartesia is a San Francisco-based voice AI startup founded in 2024 by Karan Goel and Albert Gu, the latter best known as co-author of the Mamba state-space model paper. The company’s core bet is that state-space models (SSMs) beat transformer architectures for streaming, low-latency audio generation — a thesis embodied in their flagship Sonic text-to-speech model, which advertises sub-90ms model latency for real-time conversational use cases.
Cartesia raised a $27M seed round led by Index Ventures in March 2024, followed by a $64M Series A in March 2025 also led by Index. Lightspeed Venture Partners, Conviction, and a roster of AI researchers participated. The company is private and pre-revenue-disclosure, with industry estimates putting late-2025 ARR somewhere below $50M — small relative to ElevenLabs but growing fast on the back of the voice-agents wave.
The product surface has three pillars: the Sonic TTS API (per-character or per-credit usage), instant and professional voice cloning, and a packaged Voice Agents product that bundles STT, TTS, and turn-detection into a per-minute conversational unit. Cartesia competes directly with ElevenLabs at the top of the market, PlayHT and Resemble AI in the prosumer tier, and Vapi/Retell on the voice-agents side. Their differentiator is latency plus on-premise availability — two attributes that matter to healthcare, financial-services, and regulated enterprise prospects whose SaaS-only competitors cannot serve.
Pricing summary : How Cartesia’s credit-based freemium stack works
Cartesia runs a credit-based freemium subscription for its core TTS and STT products, with a separate flat per-minute price for Voice Agents drawn from a prepaid dollar balance. The four self-serve tiers (Free, Pro, Startup, Scale) are flat monthly subscriptions that bundle a fixed credit allowance plus a prepaid agent balance, and every plan includes unlimited workspace seats and voice slots. Above Scale, the Enterprise tier converts to custom credit and agent volumes with negotiated volume pricing.
The “credit” is a deliberately abstracted unit — credits are spent per second of generated audio on the flagship Sonic-3.5 text-to-speech model, with premium features priced separately (the voice changer is 15 credits per second of audio; localizing a voice is a one-time 225-credit cost). This is similar to how credit-based AI pricing models trade transparency for flexibility: Cartesia can tune per-feature credit rates without renegotiating contract pricing, but customers cannot easily forecast bills without modelling their feature mix.
Voice Agents (the Line product) are billed separately at a flat $0.06 per minute of call duration on every plan and are NOT priced in credits — a deliberate packaging choice that simplifies forecasting for conversational use cases at the cost of obscuring which component (Ink-2 STT, Sonic-3.5 TTS, or turn-detection) drives the unit cost. This dual-axis model — credits for batch TTS, dollars-per-minute for real-time agents — mirrors the broader shift from per-user licensing to usage-based AI billing playing out across the category.
What makes this different: Cartesia is one of the few voice AI vendors advertising on-premise Sonic and Ink deployment in its Enterprise tier. ElevenLabs, OpenAI Voice, and PlayHT are all SaaS-only. That single capability — the models running inside a hospital’s or bank’s network — is the wedge that justifies enterprise pricing above the self-serve ceiling.
Pricing by product
Sonic-3.5 TTS / Ink-2 STT (self-serve credit tiers)
| Tier | Price | Included | Key mechanics |
|---|---|---|---|
| Free | $0 | 20,000 credits/mo (~27 TTS min); $1/mo prepaid agents; 2 concurrent requests | No credit card; unlimited seats; Sonic-3.5 + Ink-2 |
| Pro | $4/mo | 100,000 credits/mo (~133 TTS min); $5/mo prepaid agents; commercial license | Adds instant voice cloning; 3 concurrent requests |
| Startup | $39/mo | 1.25M credits/mo (~1,667 TTS min); $49/mo prepaid agents; organizations | Adds pro voice cloning; 5 concurrent requests |
| Scale | $239/mo | 8M credits/mo (~10,667 TTS min); $299/mo prepaid agents; priority support | Self-serve ceiling; high concurrency limits |
Enterprise (custom)
| Tier | Price | Included | Key mechanics |
|---|---|---|---|
| Enterprise | Custom | Custom credits & agent usage; on-prem / VPC; DPAs, BAAs, SSO | Volume pricing; custom concurrency limits; security reviews |
Voice Agents — Line (flat per-minute, all plans)
| Component | Rate | Notes |
|---|---|---|
| Call duration | $0.06/min | Same flat rate on Free, Pro, Startup, and Scale |
| Telephony | $0.014/min | Only when using a Cartesia-provided phone number |
| Agent funding | Prepaid dollar balance | $1/mo (Free) → $5 (Pro) → $49 (Startup) → $299 (Scale) |
| LLM usage during calls | Free (UI-created agents) | Free for a limited time per Cartesia |
Voice cloning & feature credits
| Service | Cost | Notes |
|---|---|---|
| Instant voice cloning | Included | Self-serve from the $4 Pro tier |
| Professional (pro) voice cloning | Included | Introduced at the $39 Startup tier |
| Voice changer | 15 credits / second of audio | Drawn from the plan credit allowance |
| Localizing a voice | 225 credits (one-time) | Per-voice localization cost |
Sales motions across products: self-serve PLG for Free, Pro, Startup, and Scale (all four are dashboard-purchasable, including Voice Agents); sales-led only for the custom Enterprise tier. All prices accessed 2026-05-30; Voice Agent call duration is a flat $0.06/min on every plan.
Hidden costs : What Cartesia users actually pay beyond the headline credit allowance
Archetype A: Solo developer building a voice-cloned podcast workflow (Startup plan)
A solo creator generating ~5 hours of narration per month using instant voice cloning:
| Line item | Monthly cost |
|---|---|
| Startup subscription ($39/mo) | $39.00 |
| Base allowance: 1.25M credits (~1,667 TTS min) | included |
| ~5 hrs (300 min) of TTS — well within ~1,667-min allowance | within allowance |
| Localizing one voice into a second language (225 credits, one-time) | within allowance |
| Estimated total | ~$39 |
For a pure-TTS creator, the Startup plan’s ~1,667 included minutes are hard to exhaust at podcast volumes, so the headline $39 is usually the whole bill. The forecasting trap is feature credits, not volume: the voice changer alone burns 15 credits per second of audio, so heavy use of premium features — not raw narration length — is what pushes a user toward overage. This is the most common form of AI cost unpredictability on credit-based platforms.
Archetype B: Mid-market customer-service team running voice agents (Scale plan)
A 50-agent contact-center deployment, average 4 hours/day of conversational voice across the team:
| Line item | Monthly cost |
|---|---|
| Scale subscription ($239/mo) | $239 |
| Voice-agent minutes: 50 agents × 4 hrs × 20 days = 240,000 min × $0.06 | $14,400 |
| Telephony (Cartesia numbers): 240,000 min × $0.014 | $3,360 |
| TTS credits for IVR + scripted prompts (within 8M Scale allowance) | included |
| Estimated total (approximately) | ~$18,000 |
At 240k minutes, the $239 base subscription is rounding error. Real cost sits in the flat $0.06/min agent rate plus $0.014/min telephony, both drawn from the prepaid agent balance. Because the per-minute rate is identical on every plan, the only lever at this volume is the custom Enterprise tier, where Cartesia advertises volume pricing on credits and agent usage.
Want to estimate your own Cartesia bill? Use the Cartesia pricing calculator to model costs across voice cloning, TTS character volume, and agent-minute consumption.
Pricing evolution : Cartesia’s journey from pure-usage API to credit-based SaaS
Cadence
| Quarter | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2024 Q1 | 0 | 1 | Company launch + Sonic announced; $27M seed |
| 2024 Q2 | 0 | 1 | Sonic API GA; pure usage at ~$65/1M chars |
| 2024 Q3 | 1 | 2 | Tiered subscriptions introduced: Free, Creator ($5), Pro ($49) |
| 2025 Q1 | 1 | 2 | Sonic-2 launch; clone multiplier formalized; Business ($299) added |
| 2025 Q1 | 0 | 1 | Voice Agents beta announced alongside Series A |
| 2025 Q3 | 0 | 2 | Scale + Enterprise tiers formalized; on-prem SKU added |
| 2026 Q1 | 1 | 1 | Voice Agents GA; per-minute rates published; 1k included minutes added to Business |
Tracked range: 2024 Q1–2026 Q2. Quarters not listed verified stable (0 changes, 0 additions). Sonic-2 voice-clone multiplier (2×) is technically a price change disguised as a product launch.
Notable changes
- 2024-09-30 — Moved from pure-usage API ($65/1M chars) to a freemium subscription stack. First major pricing-architecture decision, made roughly six months post-launch.
- 2025-01-22 — Sonic-2 launch silently introduced the 2× credit multiplier for cloned voices. No public announcement; documented in API reference only. This is the kind of packaging change that benefits the vendor and surprises customers.
- 2025-03-25 — Voice Agents launched as a separate per-minute SKU rather than rolled into the credit system — a packaging decision that has held since.
- 2025-09-15 — Scale tier and on-prem Enterprise SKU formalized. This was the first explicit sales-led motion at Cartesia and signaled enterprise readiness.
- 2026-02-10 — Voice Agents GA with published per-minute pricing ($0.12-0.15) and 1,000 bundled minutes on Business. This is the most recent pricing inflection.
What’s unique : Cartesia’s distinctive pricing mechanics
1. Credits as a per-model multiplier abstraction. Cartesia’s credit unit lets the company adjust per-model costs (2× for clones, 3× for Sonic Turbo) without changing the headline subscription price. This protects margin on premium models but creates aggregation complexity for customers who mix voices. ElevenLabs uses a simpler character-only model; PlayHT uses words. Cartesia’s credit abstraction is closer to OpenAI’s old “tokens” model in spirit — flexible for the vendor, harder to forecast for the buyer.
2. Dual-axis billing: credits for TTS, minutes for Voice Agents. Most voice AI vendors bill everything in the same unit (characters or minutes). Cartesia uses credits for batch TTS and minutes for agents — recognizing that a 2-minute customer-service conversation is operationally different from a 500-character notification. This composite billing approach aligns price to use case but doubles the forecasting work for buyers who do both.
3. On-premise as the enterprise wedge. Cartesia is the only major voice AI vendor offering on-prem Sonic deployment for healthcare and financial-services prospects. ElevenLabs is SaaS-only. OpenAI Voice is SaaS-only. PlayHT is SaaS-only. This single capability supports a separate Enterprise SKU with custom pricing — converting a regulatory constraint into a revenue line. See Deepgram’s blueprint for a comparable speech-side play.
4. Instant Voice Clone bundled into prosumer tier ($5/mo). Most competitors gate voice cloning to mid-tier ($22/mo at ElevenLabs Starter; sales-gated at PlayHT). Cartesia bundles IVC into the $5 Pro tier with five clones included. This is a deliberate prosumer-segment play, betting that creators who clone voices at $5 will upgrade to Startup ($49) when they scale — a textbook PLG monetization funnel.
5. Series A pricing power locked in fast. The $64M Series A in March 2025 (12 months after seed) capitalized Cartesia before competitors could undercut on price. Most voice AI startups still target $40-60 per 1M characters; Cartesia’s effective rate on the Business tier (8M credits / $299) works out to ~$37/1M characters before clone multipliers — competitive without being self-destructive. The fast follow-on removed pressure to slash prices for runway.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| Generous free tier (10k credits) builds developer goodwill without credit card friction | Credit multipliers (2× clones, 3× Turbo) are documented but easy to miss — predictable cause of bill shock |
| Sub-90ms latency is genuinely category-leading and supports real-time use cases | Voice Agents per-minute pricing is estimated/varies — less transparent than character-based TTS |
| On-prem Enterprise SKU is rare in voice AI and supports HIPAA/SOC 2 use cases competitors cannot serve | Sales-gated Professional Voice Clone setup fee not published — opacity that ElevenLabs has moved away from |
| Dual-axis pricing (credits + minutes) maps to actual use cases (batch vs conversational) | No published volume pricing tiers — Scale and Enterprise require sales calls even for predictable workloads |
| Annual commit discounts available through Scale tier give cost-sensitive customers a path | No published education or nonprofit discount tier; missing a key prosumer wedge ElevenLabs uses well |
| Startup ($49) tier delivers ~1.25M credits — competitive against ElevenLabs Creator ($22 for 100k chars) at scale | Credit-unit obfuscation makes apples-to-apples comparisons against character-priced competitors hard for buyers |
Billing UX : Cartesia’s account controls and developer console
- Self-serve upgrades — Plan changes for Free → Pro → Startup → Business are handled entirely in the dashboard at
play.cartesia.ai. No sales call required up to Business. - Annual billing — Annual plans offered with approximately 15-20% discount on self-serve tiers; Scale and Enterprise are annual-only.
- Credit overage — Overage credits sold in prepaid packs (typical: 100k credits for ~$4, 1M for ~$35). Auto-renewal of overage packs is an opt-in setting.
- Usage dashboard — Real-time credit consumption visible in the developer console, broken out by voice/model. Voice-agent minutes shown in a separate panel.
- Spend caps and alerts — Configurable monthly spend cap on overage; email alerts at 50%, 80%, 100% of allowance. This is more graduated than most competitors (ElevenLabs only alerts at 100%).
- Payment methods — Credit card and ACH/wire for Scale and Enterprise. PayPal/Apple Pay not supported as of late-stage 2026.
- Receipts and invoicing — Stripe-powered receipts for self-serve; NetSuite-routed invoices for Scale/Enterprise with NET 30 terms typical.
- Cancellation — In-app cancellation for self-serve tiers; annual commit cancellation requires written notice 30+ days before renewal.
- API key management — Multiple API keys per workspace with scoped credentials; per-key usage tracking — useful for aggregating consumption across teams.
Strategic wins : Why Cartesia’s pricing decisions worked
1. Free tier with no credit card removed friction at the top of the funnel
10,000 free credits monthly — enough to actually evaluate Sonic in a real prototype — combined with no credit card requirement created one of the most frictionless voice AI onboarding flows in the category. This is the same playbook ElevenLabs used in 2023 (and has since tightened); Cartesia is running it harder. The PLG-style free-tier play is the dominant reason Cartesia shows up on indie-developer benchmark posts more often than its market share would suggest.
2. $5 Pro tier captured prosumer segment competitors ignored
Pricing Pro at $5/month, below the typical prosumer floor of $20, opened a segment ElevenLabs and PlayHT had effectively abandoned. The bet was that $5/month creators eventually upgrade to $49 Startup as their projects grow — a classic tiered monetization ladder. Even if the conversion rate is low, the LTV math works because $5 tier is profitable at moderate volumes given the credit allowance.
3. On-prem Enterprise SKU unlocked regulated-industry revenue
Cartesia’s decision to offer on-premise Sonic deployment for HIPAA-regulated healthcare prospects converted a technical capability (model portability via SSM efficiency) into a revenue line. ElevenLabs, OpenAI Voice, and PlayHT cannot serve these customers at all. This is a textbook example of turning compliance constraints into pricing power — the contracts are large, sticky, and effectively uncontested.
4. Series A timing locked in pricing before commoditization pressure
Raising $64M just 12 months after seed gave Cartesia runway to hold pricing while transformer-based TTS commoditized. Most competitors are now forced into per-character price wars; Cartesia can hold the line on credit pricing and let cheaper alternatives compete for the prosumer floor. The funding-pricing relationship is rarely discussed but consistently determines who survives the next pricing reset.
Areas to improve : Gaps in Cartesia’s pricing approach
1. Credit multiplier opacity creates predictable bill shock
The 2× credit cost for cloned voices and 3× for Sonic Turbo is documented but buried in API reference docs. A user planning around the headline “8M credits” on Business who switches to cloned voices mid-month will hit overage at 4M characters, not 8M. The fix: surface the effective character-equivalent allowance for each plan and add a real-time projection in the usage dashboard. See bill shock patterns in AI billing for why this matters more than it looks.
2. Voice Agents per-minute pricing not published with full transparency
The $0.10-$0.15 per-minute range varies by model and turn-detection settings, but Cartesia does not publish a clear matrix. Buyers comparing against Vapi ($0.08-0.10/min published) or Retell ($0.07/min published) cannot do apples-to-apples math without a sales call. The fix: publish a per-model per-minute rate card. Sales-led conversions don’t require pre-call opacity to convert.
3. No published volume tiers between Business and Scale
The gap between Business ($299/mo, 8M credits) and Scale (sales-led, custom) is unbridged for customers consuming 30-100M credits/month who do not want to commit annually. ElevenLabs offers a $1,320/month Enterprise-Lite tier serving exactly this band. The fix: introduce a $999-$1,500 published tier serving the “more than Business, not ready for annual commit” segment — a missing rung in the pricing ladder.
4. Professional Voice Clone setup fee opaque
The $1,000-$3,000 PVC setup fee is sales-gated with no public starting price. ElevenLabs publishes its PVC at $99 (Creator tier) and includes it transparently. Cartesia’s approach makes PVC feel like an enterprise sale even for individual creators willing to pay, leaving voice-artist revenue on the table.
5. No education or nonprofit pricing
Most AI infrastructure vendors offer 50% education or nonprofit discounts as a brand-equity and acquisition play. Cartesia has neither published. The fix is mechanical: a SheerID-verified education tier at $25/month (50% off Startup) would capture student researchers and university labs at near-zero cost to Cartesia.
Key takeaways
-
Credit-based billing trades transparency for vendor flexibility — and customers notice. Cartesia’s per-model credit multipliers (2× clones, 3× Turbo) let the company tune margins without contract renegotiation, but they create forecasting friction for buyers. Pricing teams should weigh the operational flexibility of opaque units against the trust cost paid when customers hit unexpected overages.
-
Dual-axis billing (credits + minutes) correctly recognizes that batch and real-time AI workloads are economically different. Cartesia bills TTS by credits and Voice Agents by minutes because the cost drivers differ — GPU inference time for streaming agents dominates over raw token output. As AI products incorporate more real-time conversational surfaces, this dual-axis pattern will become standard.
-
On-premise deployment is the enterprise moat in voice AI. Cartesia’s on-prem Enterprise SKU serves customers that SaaS-only competitors literally cannot. For any AI infrastructure vendor whose model can run efficiently on customer hardware, a premium on-prem tier is one of the highest-margin revenue lines available — and a defensible position against commoditization.
-
Free tiers without credit cards drive disproportionate developer mindshare. Cartesia’s no-CC, 10k credit free tier is a small giveaway with outsized funnel impact: indie developers benchmark Cartesia first because it costs nothing to try. The conversion cost per qualified evaluator is roughly two orders of magnitude lower than paid marketing for AI infrastructure.
-
Pricing-ladder gaps between $299 and “call sales” are where competitors slot in. Cartesia’s missing $1,000-$1,500 published tier is a textbook gap-creating mistake — buyers who outgrow Business but aren’t ready for annual commit will look elsewhere. Every pricing ladder needs a published rung roughly every 3-5× in volume; gaps invite displacement.
UBP implications
-
Composite billing units (credits, minutes) abstract pricing risk to the vendor. Cartesia’s credit unit lets the company shift per-model economics without contract changes — a powerful UBP design when the underlying cost structure evolves (new models, new hardware). Pricing teams building usage-based products should evaluate whether a composite unit serves their margin protection better than a transparent single-dimension unit.
-
Real-time conversational AI requires per-minute (not per-token) pricing to align with cost. Voice Agents bill by minute because GPU inference time, network round-trips, and turn-detection compute dominate the cost stack — not raw output tokens. As AI products add real-time surfaces (voice, video, interactive agents), pricing teams should expect to move from per-token to per-second/per-minute billing for these workloads.
-
On-prem and VPC deployment is a defensible UBP wedge for regulated industries. Cartesia’s Enterprise tier proves that customers in healthcare, financial services, and government will pay a multi-X premium for on-prem AI that meets data-residency requirements. For any AI infrastructure provider whose model is portable, an on-prem SKU at 2-5× SaaS pricing represents one of the cleanest enterprise UBP plays available.
Sources
- Cartesia pricing page (accessed 2026-05-29)
- Cartesia documentation home (accessed 2026-05-29)
- Cartesia Sonic model overview (accessed 2026-05-29)
- Index Ventures — Cartesia Series A announcement (accessed 2026-05-29)
- Cartesia blog — Sonic-2 launch (accessed 2026-05-29)
- Cartesia API reference — credits and rate limits (accessed 2026-05-29)
- TechCrunch — Cartesia seed round coverage (accessed 2026-05-29)
- The Information — voice AI competitive landscape (accessed 2026-05-29)
- Comparable: ElevenLabs blueprint on UsagePricing (accessed 2026-05-29)
- Comparable: Deepgram blueprint on UsagePricing (accessed 2026-05-29)
Bottom line
Cartesia has built a credit-based freemium stack that competes effectively with ElevenLabs on price at the prosumer tier and with Deepgram on capability at the regulated-enterprise tier. The pricing architecture is clever — credits abstract model differences, dual-axis billing aligns to use case, and on-prem deployment unlocks regulated industries that SaaS-only competitors cannot serve. But the same credit abstraction that protects margin creates forecasting friction; the missing tier between $299 and “call sales” leaves a gap competitors will fill; and the sales-gated PVC setup fee feels archaic in a category where ElevenLabs has gone fully transparent. The bet on state-space architecture for latency-critical voice is structurally sound — and the pricing power of a fast-follow Series A gives Cartesia room to hold positioning while transformers commoditize underneath them.
Browse the full pricing blueprint to compare Cartesia against ElevenLabs, Deepgram, and other voice AI vendors.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
Voice Agents GA + Flat Per-Minute Pricing
Voice Agents (Line) moved to general availability with a flat $0.06 per minute of call duration on every plan, plus $0.014 per minute of telephony when using a Cartesia-provided phone number. Agent usage is funded from a prepaid dollar balance ($1/mo on Free up to $299/mo on Scale) rather than a bundle of included minutes.
Scale Tier + Enterprise Deployment Introduced
An 8M-credit Scale tier was positioned as the self-serve ceiling with priority support and high concurrency limits, while a custom Enterprise SKU added on-premise / VPC deployment of the Sonic and Ink models with DPAs, BAAs, and SSO for compliance-driven prospects.
Series A ($64M) + Voice Agents Beta
Index Ventures led a $64M Series A. Cartesia simultaneously announced Voice Agents — a packaged STT+TTS+turn-detection product priced per minute rather than per character — positioning against Vapi and Retell.
Next-generation Sonic + Pricing Refresh
Cartesia shipped a next-generation Sonic release with improved naturalness and broad multilingual support, and rebalanced credit consumption so that premium features (voice changer, voice localization) draw additional credits beyond base synthesis. An 8M-credit production tier was added for higher-volume teams.
Tiered Subscription Plans Introduced
Cartesia restructured from pure usage to a freemium subscription model with three tiers: Free (10k credits), Creator ($5/mo for 100k credits), and Pro ($49/mo for 1.25M credits). Credit-pack overages preserved for power users. This was the first move toward a SaaS-style commitment model.
Sonic API General Availability
Sonic TTS API opened to the public with a free tier of 10,000 monthly credits and a developer-focused, pay-as-you-go usage model. No subscription tiers yet — purely metered API with prepaid credit packs. Targeted at developers integrating real-time voice into apps.
Company Launch + $27M Seed Round
Cartesia emerged from stealth with a $27M seed round led by Index Ventures. Founders Karan Goel and Albert Gu announced Sonic, the first commercial state-space TTS model. Initial product was API-only with usage-based per-character pricing at approximately $65 per 1M characters — undercutting ElevenLabs by roughly 30%.
- · Cartesia was founded in 2024 by Karan Goel and Albert Gu — the same Albert Gu who co-authored the Mamba state-space model paper at CMU. Cartesia's Sonic model is a direct commercial application of state-space architecture, betting that SSMs beat transformers for real-time streaming audio.
- · Sonic was the first commercial TTS model to advertise sub-90ms model latency — roughly 3-5× faster than ElevenLabs Turbo at launch. That latency number is itself a marketing artifact: it measures only the model, not the network round-trip a developer actually pays for.
- · Cartesia raised a $27M seed in March 2024 led by Index Ventures, then a $64M Series A in March 2025 also led by Index — an unusually fast follow-on that locked in pricing power before competitors could undercut. Lightspeed, Conviction, and a roster of AI researchers participated.
Questions & answers
- How much does Cartesia cost per month?
- Cartesia's self-serve plans are Free (20,000 credits/month), Pro at $4/month (100,000 credits), Startup at $39/month (1.25 million credits), and Scale at $239/month (8 million credits). Scale is fully self-serve — not a sales-gated tier. Above Scale, Enterprise uses custom credit and agent volumes with volume pricing negotiated with sales.
- What is a Cartesia credit and how does it convert to audio?
- Credits are consumed per second of generated audio on the flagship Sonic-3.5 text-to-speech model, and the included minutes scale with the plan: roughly 27 TTS minutes/month on Free, ~133 on Pro, ~1,667 on Startup, and ~10,667 on Scale. Premium features cost extra credits — the voice changer is 15 credits per second of audio, and localizing a voice is a one-time 225-credit cost.
- Does Cartesia have a free tier and is a credit card required?
- Yes. The free tier provides 20,000 credits per month with no credit card required at signup, plus a $1/month prepaid balance for voice agents. Free-tier users get the Sonic-3.5 text-to-speech and Ink-2 speech-to-text models, unlimited workspace seats, and one voice agent slot.
- How is Cartesia's voice-agent pricing different from per-second TTS?
- Cartesia's Line voice agents are billed at a flat $0.06 per minute of call duration on every plan — Free, Pro, Startup, and Scale all pay the same rate — plus $0.014 per minute of telephony when you use a Cartesia-provided phone number. Agents draw from a prepaid dollar balance ($1/mo on Free up to $299/mo on Scale) rather than a bundle of included minutes.
- What is the difference between instant and professional voice cloning?
- Instant voice cloning is included from the $4 Pro tier and produces a usable clone from a short reference clip. Professional (pro) voice cloning is introduced at the $39 Startup tier. Localizing a voice into another language is billed as a one-time 225-credit cost rather than a flat dollar setup fee.
- Does Cartesia offer on-premise deployment?
- Yes, through the Enterprise tier. Cartesia is one of the few voice AI vendors that advertises on-premise or virtual-private-cloud deployment of its Sonic and Ink models, targeting healthcare (BAAs), financial services, and compliance-driven customers with strict data-residency requirements. Enterprise pricing is custom and adds DPAs/BAAs, SSO, and security reviews.