All companies
technology

Novita AI pricing

novita.ai facts checked analysis reviewed
Quick summary
Region
Product
Pay-as-you-go AI cloud: 200+ model inference APIs, on-demand GPUs, and per-second agent sandboxes under one API
Industry
technology
Commits
None
In this page
AI Summary
  • Novita AI is a pay-as-you-go AI cloud offering 200+ model inference APIs, on-demand and bare-metal GPUs, and secure per-second agent sandboxes under a single API.
  • Serverless inference is billed per million tokens (LLMs), per image, per video-second, or per 1M characters (audio), with no monthly minimum and a free-to-start tier.
  • LLM rates range from $0.02/M input (Llama 3.1 8B) to $1.6/M input and $3.2/M output (DeepSeek V4 Pro), with cache-read discounts on many models.
  • On-demand GPU instances start at $0.55/hr (L40S 48GB) and reach $2.59/hr (H100 SXM 80GB), with spot instances at roughly half price.
  • Bare-metal 8-GPU nodes are quoted at $1.70/GPU/hr (H100 SXM) and $4.77/GPU/hr (B200 SXM); dedicated endpoints bill per-second per replica and scale to zero.
  • Agent Sandbox uses pure per-second billing on vCPU and memory, with an Enterprise tier (99.95% uptime SLA, custom regions) sold via contact sales.
Pricing summary
Novita AI 2026 — pay-as-you-go AI cloud
Pure usage: per-token model inference, per-hour GPUs, and per-second agent sandboxes — free to start, no monthly minimum
Serverless Inference
From $0.02 /M tokens
Developers calling 200+ open models via API
GPU Instances
From 0.55 /hr/GPU
Teams renting on-demand or spot GPUs
Agent Sandbox
Per-second vCPU + RAM
AI agents executing code, browser, computer-use
Bare Metal GPU
$1.70–$4.77 /GPU/hr
Large training / multi-node clusters
Enterprise (Sandbox)
Custom
Higher limits, custom regions, dedicated infra
All prices read verbatim from Novita's live pricing surfaces on 2026-06-02 (USD). The same H100 carries three prices depending on product (instance 2.59/hr/GPU · dedicated $1.99/GPU-hr · bare-metal $1.70/GPU/hr).

About

Novita AI is a pay-as-you-go AI cloud that bundles three products under a single API: serverless model inference across 200+ open models, GPU compute (on-demand, spot, and bare-metal), and a per-second Agent Sandbox runtime for executing AI-generated code, browser workflows, and computer-use tasks. Its positioning line — “200+ models, on-demand GPUs, and secure agent runtimes — unified under one API. Free to start, scales as you grow” — captures the strategy: be the cheapest, broadest place for developers to run open-weight models and the GPUs underneath them.

The company competes with inference aggregators (Together AI, Fireworks, DeepInfra, Replicate) on the model-API side and with GPU clouds (RunPod, Lambda, Vast.ai) on the compute side. Its differentiator is doing both at once and undercutting first-party model APIs: DeepSeek, Qwen, GLM, Kimi, MiniMax, and Llama families are all listed at aggressive per-million-token rates, frequently with cache-read discounts. Novita serves individual developers and prosumers (self-serve, free to start) up through SMBs and enterprises (dedicated endpoints, bare-metal clusters, and a contact-sales Enterprise sandbox tier with a 99.95% SLA).

Pricing transparency is a notable strength: nearly every dimension — token rates for 226 models, per-image and per-video-second media rates, on-demand and spot GPU hourly rates, and per-second sandbox examples — is published openly, with “contact sales” reserved for bare-metal nodes beyond H100/B200 and the Enterprise sandbox tier.

Novita’s archived pricing pages show it started life around 2023–2024 as a credit-funded Stable-Diffusion image-generation API (footer HQ in Singapore), then pivoted through 2025 into a full inference-plus-GPU cloud and relisted its HQ in San Francisco — see Pricing evolution. Public funding is undisclosed (Crunchbase and Tracxn list no disclosed rounds), and community signal is modest rather than viral: the company has no Hacker News thread above single digits and surfaces mostly in r/LocalLLaMA model-availability chatter, so its growth has been distribution-led (cheap open-model inference) rather than launch-hype-led.


Pricing summary : How Novita AI’s pure usage-based AI cloud bills

Novita AI uses a pure usage-based model — free to start, no seats, no monthly minimum — billed across four metering dimensions:

  1. Per-token model inference (serverless): LLMs are billed per million tokens, split input/output, e.g. Llama 3.1 8B at $0.02/M in · $0.05/M out, DeepSeek V3.1 at $0.27/M in · $1/M out, and DeepSeek V4 Pro at $1.6/M in · $3.2/M out, with cache-read rates (often ~50% of input) on many models. Image generation is per image (from $0.001/image), video per video or per second (e.g. $0.30/video Hunyuan Video Fast), and audio per 1M characters ($15/1M characters Fish Audio TTS).
  2. Per-hour GPU instances: On-demand from 0.55/hr/GPU (L40S 48GB) to 2.59/hr/GPU (H100 SXM 80GB), with spot pricing at roughly half (RTX 4090 0.67 on-demand vs 0.34 spot) — prices shown as Novita renders them, in USD with no $ glyph. Billed per second, scale to zero.
  3. Per-GPU-hour dedicated endpoints + bare metal: Isolated dedicated endpoints at $0.61 (RTX 4090) / $1.99 (H100) / $2.99 (H200) per GPU-hour; bare-metal 8-GPU nodes at $1.70/GPU/hr (H100 SXM) and $4.77/GPU/hr (B200 SXM).
  4. Per-second agent sandbox: Billed on allocated vCPU + memory, e.g. ~$0.0034 for a 5-minute task on 1 vCPU + 512 MiB RAM.

What makes this different: Novita prices the same physical GPU three ways depending on packaging (shared-serverless token rates, on-demand hourly instances, and reserved bare-metal per-GPU-hour), letting a customer move down the cost curve as commitment rises — without ever signing an annual contract.


Pricing by product

Serverless model inference (per-token LLMs)

ModelContextInputOutputKey mechanics
Llama 3.1 8B Instruct16,384$0.02 /M$0.05 /MCheapest flagship-class LLM listed
Llama 3.3 70B Instruct131,072$0.135 /M$0.4 /MPopular mid-size open model
DeepSeek V3.1131,072$0.27 /M$1 /MCache read $0.135/M
DeepSeek V4 Pro1,048,576$1.6 /M$3.2 /MFlagship; cache read $0.135/M
DeepSeek V4 Flash1,048,576$0.14 /M$0.28 /MLow-cost long-context; cache read $0.028/M
Qwen3.7-Max1,000,000$1.25 /M$3.75 /MCache read $0.125/M; cache write $1.5625/M
GLM-5.1204,800$1.38 /M$4.4 /MCache read $0.26/M
Kimi K2.6262,144$0.8 /M$3.4 /MCache read $0.16/M
MiniMax-M2204,800$0.3 /M$1.2 /MCache read $0.03/M
OpenAI GPT OSS 120B131,072$0.05 /M$0.25 /MOpen-weight GPT OSS

Catalog lists 226 models across LLM, Image, Audio, Video, Embedding, Reranker, and Vision. Some models (Qwen3 Max, MiniMax-M3, MiMo-V2.5-Pro) show “Tiered pricing” instead of a flat rate. Batch inference carries an introductory 50% discount on input/output tokens for supported models.

Serverless media inference (per-image / per-video / per-character)

ModalityExample APIPriceKey mechanics
ImageText to Image (512×512, 5 steps)$0.001 /imageScales with dimensions, steps, upscaling
ImageFlux.1 Kontext Pro$0.036 /imagePremium editing model
ImageHunyuan Image 3$0.1 /imageHigh-end generation
VideoHunyuan Video Fast (5s, 720p)$0.30 /videoPer finished video
VideoKling 2.5 Turbo (5s)$0.35 /video$0.70 for 10s
VideoKling v3.0 Pro (per second)$0.224–$0.336 /sAudio variants priced higher
AudioFish Audio Text to Speech$15 /1M charactersPer-character TTS
AudioMiniMax speech-2.6-hd$100 /1M charactersPremium HD voice
AudioMiniMax Voice Cloning$2.4 /voicePer cloned voice

GPU instances (on-demand vs spot, per hour)

GPUVRAMOn-DemandSpotKey mechanics
L40S 48GB48 GB0.55 /hr/GPUCheapest on-demand instance
RTX 4090 24GB24 GB0.67 /hr/GPU0.34 /hr/GPUSpot ~49% off
RTX 5090 32GB32 GB0.73 /hr/GPU0.37 /hr/GPULatest Blackwell consumer GPU
H100 SXM 80GB80 GB2.59 /hr/GPU1.30 /hr/GPUPer-second billing, scale to zero

GPU instance rates above are shown exactly as Novita renders them on /en/gpus (USD per GPU per hour, no $ glyph on the source page).

Dedicated endpoints (isolated GPUs, per GPU-hour)

GPUVRAMPrice / GPU-hourKey mechanics
RTX 409024 GB$0.61Per-second billing on running replicas
NVIDIA H10080 GB$1.99Guaranteed performance, no sharing
NVIDIA H200141 GB$2.99Marked “Popular”; autoscaling + scale-to-zero

Dedicated image endpoints are sold as monthly subscriptions: Standard $559/month and Pro $1,199/month (both: exclusive high-performance GPU, unlimited images, 24/7 support, load 500 models), via contact sales.

Bare-metal GPU servers (8 GPUs per node, per GPU/hr)

GPUConfigPrice / GPU/hrKey mechanics
H100 SXM8× per node, 80 GB HBM3$1.70”Best value”; NVLink 900 GB/s + RDMA
B200 SXM8× per node, 192 GB HBM3e$4.77”Top performance”; NVLink 5th-gen 1.8 TB/s
H200 SXM8× per node, 141 GB HBM3eCustomLarge-context / KV-cache workloads; contact us
RTX 50908× per node, 32 GB GDDR7CustomCost-efficient inference; contact us
RTX 40908× per node, 24 GB GDDR6XCustomBroadest software compatibility; contact us

Agent Sandbox (per-second on vCPU + memory)

ConfigurationWorkload exampleEstimated costKey mechanics
1 vCPU + 512 MiB RAMShort-lived agent task (5 min)~$0.0034Billed per second
2 vCPU + 1 GiB RAMCode execution job (1 hr)~$0.0821No plans, no lock-ins
8 vCPU + 8 GiB RAMMulti-agent / RL workload (1 hr)~$0.3744Sub-200ms startup
EnterpriseHigher limits, custom regionsCustom (contact sales)99.95% uptime SLA; unlimited concurrency

Sales motions across products: PLG / self-serve for serverless inference, GPU instances, dedicated endpoints, and standard sandbox usage; sales-led for bare-metal nodes beyond H100/B200, image endpoint subscriptions, and the Enterprise sandbox tier.


Hidden costs : What inference + GPU bills actually look like at volume

The “free to start” headline hides how quickly a production workload can compound across token, GPU, and sandbox dimensions. Two representative archetypes:

A startup running a DeepSeek-V3.1 chat product

Line itemMonthly cost
300M input tokens @ $0.27/M$81
120M output tokens @ $1/M$120
50M cache-read tokens @ $0.135/M$6.75
1× dedicated H100 endpoint, ~200 hrs active @ $1.99/GPU-hr$398
Total~$605.75

The dedicated endpoint — not the tokens — dominates the bill once you reserve isolated capacity, so teams must weigh shared serverless token rates against the latency guarantees of a dedicated GPU.

An agent platform running browser + code sandboxes

Line itemMonthly cost
50,000 short coding tasks @ ~$0.0034 (1 vCPU/512MiB, 5 min)$170
5,000 hour-long multi-agent jobs @ ~$0.3744 (8 vCPU/8GiB)$1,872
Spot RTX 4090 for model serving, ~300 hrs @ $0.34/hr$102
Total~$2,144

Per-second sandbox pricing looks trivial per task, but at agent-platform concurrency the long-running 8-vCPU jobs are the cost driver — exactly the line a usage-based pricing buyer should model before committing.

Want to estimate your own Novita AI bill? Use the Novita AI pricing calculator to model your monthly cost based on token volume, GPU hours, and sandbox seconds.


Pricing evolution : From a credit-funded image API to a full AI cloud stack

Novita’s archived pricing pages tell a clean origin story: it started in 2023–2024 as a Stable-Diffusion image-generation API billed out of a prepaid credit balance (USDT/Stripe top-ups, “1/10 of DALL-E2 and MJ”), then pivoted through 2025 into a broad model-inference + GPU cloud, adding LLMs, Audio, GPUs, Dedicated Endpoints, and finally a per-second Agent Sandbox — moving its listed HQ from Singapore to San Francisco along the way.

Cadence

QuarterPrice changesProduct / SKU additionsNotes
2024 Q10 (baseline)0Credit-funded image API; text-to-image $0.0015/image (512×512), upscale from $0.0021/image; Singapore HQ.
2024 Q202LLM API and Audio (Text to Speech, Voice Cloning) added to the catalog; image rates unchanged.
2025 Q111Redesign to “Model APIs / GPUs” tabs; HQ moves to San Francisco; $10,000 startup-credits program; image shown at $0.003/image (1024×1024); Hunyuan/Wan video rates listed per-second.
2025 Q311Agent Sandbox launches; page splits into Serverless / Dedicated / GPUs / Sandbox tabs; image back to $0.001/image (512×512); full LLM rate card published (DeepSeek V3.1 $0.27/$1, Llama 3.1 8B $0.02/$0.05).
2025 Q411GLM-4.6, Kimi K2 and DeepSeek V3.2 Exp added; V3.2 Exp output cut to $0.41/M vs V3.1’s $1/M.

Tracked range: 2024-02-25 → 2026-06-02 via Wayback. Quarters not listed (2024 Q3–Q4, 2025 Q2, 2026 Q1) showed no material pricing or packaging change in the archived snapshots reviewed.

Notable changes

  • 2024-02-25 — Earliest archived surface: a credit-funded image-generation API (txt2img $0.0015/image, USDT/Stripe top-ups), tagline “1/10 of DALL-E2 and MJ,” Singapore HQ.
  • 2024-05-21LLM API and Audio (TTS, Voice Cloning) added to the product catalog.
  • 2025-02-10 — Major redesign to Model APIs + GPUs tabs; listed HQ moves to San Francisco; $10,000 startup-credits program launches.
  • 2025-08-04Agent Sandbox launches; pricing splits into four endpoint tabs; batch inference advertised at an introductory 50% token discount.
  • 2025-09-11 — Full serverless LLM rate card published (DeepSeek V3.1 $0.27/$1, Llama 3.1 8B $0.02/$0.05, GLM-4.5 $0.6/$2.2), with several small models listed Free.
  • 2025-10-03GLM-4.6, Kimi K2, DeepSeek V3.2 Exp added; DeepSeek V3.2 Exp output token cut to $0.41/M.
  • 2026-04-21 — Per the Novita blog, Kimi K2.6 launched at $0.95/$4.00 per M tokens; by the 2026-06-02 capture the same family is listed at $0.8/$3.4 — a downward revision within ~6 weeks, consistent with Novita’s habit of trimming rates after launch.

The model-API pivot in detail

The most consequential change is not a single price move but a product-category pivot. In early 2024 Novita’s entire pricing page was image-generation APIs billed from a prepaid credit wallet — the same packaging a hobbyist Stable-Diffusion service would use. By late 2025 the page had become a four-tab AI-cloud rate card spanning per-token LLM inference, per-hour GPUs, per-GPU-hour dedicated endpoints, and per-second agent sandboxes. The credit-wallet UX was replaced by pure pay-as-you-go metering, and the headline shifted from “cheaper than DALL-E” to “200+ models under one API.” Novita rode the open-weight model wave — DeepSeek, Qwen, GLM, Kimi — from a niche media tool into a general-purpose inference cloud in under two years.


What’s unique : One GPU, three prices, zero contracts

1. The same GPU is sold at three price points by packaging. An NVIDIA H100 is $2.59/hr as an on-demand instance, $1.99/GPU-hr as a dedicated inference endpoint, and $1.70/GPU/hr on an 8-GPU bare-metal node. Customers self-select down the cost curve as their commitment and isolation needs rise — with no annual contract required at any step.

2. Four metering dimensions under one API. Tokens, GPU hours, GPU-hours-per-replica, and sandbox vCPU-seconds are all billed independently, so a single customer can mix shared inference, reserved compute, and ephemeral agent runtimes on one account and one invoice.

3. Per-second billing reaches all the way to agent sandboxes. Most inference clouds stop at per-hour GPU billing; Novita extends pure-usage granularity to per-second vCPU+memory for agent execution, quoting a 5-minute task at fractions of a cent.

4. Aggressive undercutting of first-party model APIs. With 226 models and cache-read discounts (~50% of input on many), Novita positions as the cheap default for open-weight DeepSeek, Qwen, GLM, Kimi, and Llama inference rather than going to each model maker directly — leaning into the token-cost deflation that keeps compressing per-million-token rates.


Strengths & weaknesses

StrengthsWeaknesses
Radical price transparency — 226 model rates + GPU + sandbox examples all publicPricing surface is sprawling; comparing the “same” GPU across products is confusing
Pure usage, free to start, no monthly minimum or seat feesSome flagship models show only “Tiered pricing” with no public rate
Spot GPUs at ~50% of on-demand for cost-sensitive workloadsNo committed-use discounts or annual commit tier published
Per-second granularity down to agent sandboxesBare-metal beyond H100/B200 and Enterprise sandbox are contact-sales only
One API + one invoice across inference, GPUs, and agent runtimesNo public uptime SLA except the Enterprise sandbox tier (99.95%)
Rapid catalog refresh — new flagship models added within days of release (DeepSeek V4, Kimi K2.6, GLM-4.7)Undisclosed funding and thin public/community signal (no HN thread above single-digit points) make durability harder to assess

Billing UX : Tabbed pricing surface, spot toggle, and sandbox estimator

  • Tabbed pricing page — the /en/pricing surface splits into Serverless Endpoints, Dedicated Endpoints, Agent Sandbox, and GPUs tabs, each rendering its own rate tables.
  • Modality filters on serverless rates — LLM / Image / Audio / Video / Cache filters let users narrow the 226-model rate table to a single modality.
  • On-Demand vs Spot toggle — the GPU Instance page shows on-demand and spot rates side by side per GPU, surfacing the ~50% spot saving.
  • Pricing Calculator links — image and video rate tables explicitly point to a Pricing Calculator for dimension-dependent estimates (“varies based on image dimensions, inference steps, and upscaling”).
  • Sandbox cost estimator — the Agent Sandbox page lists sample configurations with workload examples and estimated per-task costs (e.g. ~$0.0034 / ~$0.0821 / ~$0.3744).
  • Scale-to-zero controls — dedicated endpoints advertise “per-second billing on running replicas only. Scale to zero, pay zero.”

Strategic wins : Why Novita’s packaging decisions work

1. Bundling inference and GPUs captures the whole workload

By selling both the model API and the GPU underneath it, Novita captures customers at whichever layer they prefer to operate — and can route them up or down the stack as needs change. This is the same full-funnel logic that makes hybrid platforms sticky, applied to pure usage.

2. Three-price GPU laddering monetizes commitment without contracts

Offering the same H100 at instance, dedicated, and bare-metal prices lets customers trade flexibility for cost on their own terms. Novita earns more from casual on-demand users while still winning price-sensitive reserved buyers — without the friction of an annual commitment.

3. Per-second sandbox pricing lands the agent wave early

Pricing agent execution per vCPU-second positions Novita squarely in front of the 2026 surge in coding agents, browser automation, and RL environments — a use case many GPU clouds price too coarsely to win, even as agentic workflows become a cost monster for buyers who don’t meter them tightly.


Areas to improve : Where the pricing surface creates friction

1. Consolidate the “same GPU, three prices” confusion

A buyer comparing H100 options sees $2.59, $1.99, and $1.70 across three pages with no single comparison view. A unified “choose your H100 packaging” matrix — instance vs dedicated vs bare-metal, with break-even hours — would convert better than forcing the buyer to reconcile three tabs, and would blunt the bill-shock and cost-unpredictability risk that scares finance teams away from raw usage pricing.

2. Publish rates for “Tiered pricing” models

Flagship models like Qwen3 Max and MiniMax-M3 show only “Tiered pricing” with no number, undercutting the otherwise-excellent transparency. Even a starting rate or a published tier table would remove a sales-friction point for the most in-demand models.

3. Offer a committed-use discount tier

Novita has no public annual-commit or reserved-capacity discount outside bare-metal. A self-serve committed-use option (prepay credits for a discount) would give predictable workloads a reason to consolidate spend on Novita, echoing how credit-pool models reward commitment.


Key takeaways

  1. Sell the same resource at multiple price points by packaging. Novita’s instance/dedicated/bare-metal ladder for one H100 shows how to monetize commitment without locking customers into contracts.
  2. Transparency is a competitive weapon in usage-based markets. Publishing 226 model rates plus GPU and sandbox examples lets developers self-qualify and reduces sales friction versus gated competitors.
  3. Extend metering granularity to match the workload. Per-second sandbox billing fits agent execution far better than per-hour GPU billing — the unit should mirror how the customer actually consumes.
  4. Cache-read discounts reshape effective token cost. Listing cache-read rates (~50% of input) on many models materially changes real bills and should be modeled, not ignored.
  5. “Free to start” still needs a cost model. The headline hides four compounding dimensions; the buyers who win are the ones who model token, GPU, and sandbox spend together before scaling.

UBP implications

  1. Multi-dimensional metering is becoming table stakes for AI clouds. Novita bills on tokens, GPU hours, replica-hours, and vCPU-seconds simultaneously — a sign that single-metric usage pricing is insufficient for stacked AI infrastructure.
  2. Packaging, not just price, is the lever. The same silicon at three prices proves that how usage is packaged (shared vs dedicated vs reserved) is itself a pricing dimension, independent of the headline rate.
  3. Per-second billing is the new frontier for agent economies. As AI agents proliferate, the vendors that meter execution at second-level granularity will out-price those still selling hourly compute blocks.

Sources


Bottom line

Novita AI is one of the most transparent pure-usage AI clouds in the corpus: 226 model rates, on-demand and spot GPUs, bare-metal nodes, and per-second agent sandboxes are all published openly, with the same H100 laddered across three price points so customers buy exactly the commitment they want. The cost of that breadth is a sprawling, multi-tab pricing surface that asks buyers to reconcile token, GPU, and sandbox math themselves — a transparency-versus-clarity trade-off worth watching.

Want to compare Novita AI against other AI-infrastructure pricing? Browse the pricing blueprint.

Pricing timeline : Major events on a vertical axis

Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.

Pricing snapshot — per-token inference, per-hour GPU, per-second sandbox

Novita lists 226 models on per-token/per-image/per-video pricing; on-demand GPUs from $0.55/hr; bare-metal H100 $1.70 and B200 $4.77 per GPU/hr; dedicated endpoints H100 $1.99 / H200 $2.99 per GPU-hour; agent sandbox billed per-second on vCPU + memory.

Pricing snapshot — per-token inference, per-hour GPU, per-second sandbox - Novita lists 226 models on per-token/per-image/per-video pricing; on-demand GPUs
captured

GLM-4.6, Kimi K2, DeepSeek V3.2 Exp added; output-token cuts

Catalog expands with zai-org/glm-4.6 ($0.6/$2.2), moonshotai/kimi-k2-instruct ($0.57/$2.3) and deepseek-v3.2-exp at $0.27/$0.41 — a sharp output-token cut versus V3.1's $1. DeepSeek V3.1 input/output held at $0.27/$1. Wayback web.archive.org/web/20251003030153/novita.ai/pricing.

GLM-4.6, Kimi K2, DeepSeek V3.2 Exp added; output-token cuts - Catalog expands with zai-org/glm-4.6 ($0.6/$2.2), moonshotai/kimi-k2-instruct ($
captured

Full serverless LLM rate card published

Serverless Endpoints tab renders the full per-million-token catalog: DeepSeek V3.1 $0.27/$1, Llama 3.1 8B $0.02/$0.05, Llama 3.3 70B $0.13/$0.39, GLM-4.5 $0.6/$2.2, Qwen3-Coder-480B $0.29/$1.2, with several small models (Llama 3.2 1B, Qwen3 4B, Gemma 3 1B) listed Free. Wayback web.archive.org/web/20250911221805/novita.ai/pricing.

Full serverless LLM rate card published - Serverless Endpoints tab renders the full per-million-token catalog: DeepSeek V3
captured

Agent Sandbox launches; pricing splits into four endpoint tabs

Pricing page restructured into Serverless Endpoints / Dedicated Endpoints / GPUs / Agent Sandbox tabs — the per-second Agent Sandbox product is now live. Batch inference advertised at an introductory 50% token discount. Image rate back to $0.001/image (512×512, 5 steps); MiniMax speech-02-hd $80/1M characters, Voice-Cloning $2.4/voice. Wayback web.archive.org/web/20250804005313/novita.ai/pricing.

Agent Sandbox launches; pricing splits into four endpoint tabs - Pricing page restructured into Serverless Endpoints / Dedicated Endpoints / GPUs
captured

Redesign to Model APIs + GPUs; HQ moves to San Francisco

Pricing page redesigned to a light-theme 'Model APIs / GPUs' tabbed layout; footer HQ changes to 156 2nd Street, San Francisco. A '$10,000 in credits' startup program launches. Image rate shown at $0.003/image (1024×1024, 20 steps); Text to Speech $15/1M characters. Wayback web.archive.org/web/20250210165912/novita.ai/pricing.

Redesign to Model APIs + GPUs; HQ moves to San Francisco - Pricing page redesigned to a light-theme 'Model APIs / GPUs' tabbed layout; foot
captured

LLM API and Audio added alongside image/video

An 'LLMs' tab and an LLM API product appear in the catalog; Audio (Text to Speech, Voice Cloning) is added. Still credit-based with the same image rates (txt2img $0.0015/image). Wayback web.archive.org/web/20240521203043/novita.ai/pricing.

LLM API and Audio added alongside image/video - An 'LLMs' tab and an LLM API product appear in the catalog; Audio (Text to Speec
captured

Credit-funded image-generation API (Singapore)

Earliest archived pricing surface: Novita was a Stable-Diffusion image API billed from a prepaid USDT/Stripe credit balance, tagline 'The price is only 1/10 of DALL-E2 and MJ.' Text-to-image quoted at $0.0015/image (512×512, 20 steps), upscale from $0.0021/image. Footer listed a Singapore HQ (14 Robinson Road). Wayback web.archive.org/web/20240225192128/novita.ai/pricing.

Credit-funded image-generation API (Singapore) - Earliest archived pricing surface: Novita was a Stable-Diffusion image API bille
captured
Trivia
  • · Novita publishes per-second billing for both GPU instances and agent sandboxes — a 5-minute coding-agent task on 1 vCPU + 512 MiB RAM is quoted at roughly $0.0034.
  • · The same NVIDIA H100 appears at three different prices depending on product: $2.59/hr on-demand GPU instance, $1.99/GPU-hour as a dedicated endpoint, and $1.70/GPU/hr on an 8-GPU bare-metal node.
  • · Novita lists 226 models on its catalog and undercuts first-party APIs — DeepSeek V3.1 runs $0.27 input / $1 output per million tokens versus DeepSeek's own rates.

Questions & answers

How does Novita AI pricing work?
Novita is pure pay-as-you-go. Model inference is billed per million tokens (LLMs), per image, per video-second, or per 1M characters (audio); GPUs are billed per hour (on-demand or spot); and agent sandboxes are billed per second on vCPU and memory. There is no monthly minimum and you can start for free.
How much does an NVIDIA H100 cost on Novita AI?
It depends on the product. An on-demand H100 SXM 80GB GPU instance is $2.59/hr ($1.30/hr spot), a dedicated inference endpoint on H100 80GB is $1.99 per GPU-hour, and an 8-GPU bare-metal H100 SXM node is $1.70 per GPU/hr.
Does Novita AI have a free tier?
Yes. Novita advertises 'Free to start, scales as you grow' — you can sign up and call the model APIs without a subscription, paying only for the tokens, images, GPU hours, or sandbox seconds you consume.
What is the cheapest LLM on Novita AI?
Among flagship-class models, Llama 3.1 8B Instruct is among the cheapest at $0.02/M input and $0.05/M output. Larger models like DeepSeek V4 Pro run $1.6/M input and $3.2/M output.
How is the Novita Agent Sandbox billed?
The Agent Sandbox is billed per second based on the vCPU and memory you allocate, with no plans or lock-ins. Example quotes include ~$0.0034 for a 5-minute task on 1 vCPU + 512 MiB RAM and ~$0.3744 for a 1-hour multi-agent workload on 8 vCPU + 8 GiB RAM.
Does Novita AI offer bare-metal GPU servers?
Yes. Bare-metal nodes ship 8 GPUs per node with zero virtualization overhead and contractual SLAs. Published rates are $1.70/GPU/hr for H100 SXM and $4.77/GPU/hr for B200 SXM; H200, RTX 5090, and RTX 4090 nodes are quoted via contact sales.