AI Summary
About
Modal is a New York-based serverless compute platform founded in September 2021 by Erik Bernhardsson (creator of Luigi and Annoy, ex-Spotify ML lead) and Akshat Bubna. The product is an opinionated Python-native serverless cloud: customers wrap functions with @app.function() decorators specifying GPU type, container image, memory, and concurrency, and Modal handles container build, image registry, scheduling, autoscaling, and per-second billing without exposing Kubernetes or infrastructure abstractions. The platform supports both interactive workloads (Jupyter, web endpoints) and batch jobs (training, batch inference, ETL).
By 2026 Modal serves Suno, Substack, Ramp, Notion, Anthropic (for select internal workloads), and roughly 1,000 paying customers spanning AI-native startups, academic research labs, and mid-market product engineering teams running Python data pipelines. The company raised a $16M Series A in February 2023 led by Redpoint with Lux Capital participation, followed by a Series B (reportedly $31M+) in mid-2024. The startup credit program grants up to $10,000 to qualifying companies — one of the most generous in the serverless GPU category.
Modal competes with RunPod, Replicate, Baseten (for AI inference), Lambda Labs, and serverless platforms like AWS Lambda + GPU options. Its differentiation is per-second billing granularity (finest in the category), Python-native developer ergonomics (@app.function() decorators rather than Docker + Kubernetes manifests), and a founder whose creation of widely-deployed open-source data tooling (Luigi, Annoy) gives the developer-experience pitch unusual credibility.
Pricing summary : How Modal’s per-second compute + plan-tier stack works
Modal runs a hybrid pricing model: pure-usage per-second compute (GPU, CPU, memory) plus flat monthly subscription fees that unlock concurrency limits, seat counts, and included credit grants. Starter is free with $30/month in compute credits, 3 workspace seats, 100 container concurrency, and 10 GPU concurrency — enough for evaluation and small production. Team is $250/month + compute with $100/month in credits, unlimited seats, 1,000 container concurrency, and 50 GPU concurrency. Enterprise is quote-based with volume discounts, dedicated capacity, VPC deployment, and AWS/GCP Marketplace billing.
The compute rate card is denominated per second across every SKU: T4 at $0.000164/sec, L4 at $0.000222, A10 at $0.000306, L40S at $0.000542, A100 40GB at $0.000583, A100 80GB at $0.000694, RTX PRO 6000 at $0.000842, H100 at $0.001097, H200 at $0.001261, B200 at $0.001736. CPU is $0.0000131/physical-core/sec (minimum 0.125 cores); memory is $0.00000222/GiB/sec; persistent volume storage is $0.09/GiB-month with 1 TiB/month free. This per-second granularity is two orders of magnitude finer than the per-minute industry standard.
What makes this different: Modal is the only Python-native serverless platform where @app.function() decorators are the primary interface — there is no Docker file or Kubernetes manifest in the canonical workflow. For data scientists and ML engineers who write Python but not YAML, this developer-experience choice is materially better than alternatives — and the founder’s open-source pedigree (Luigi, Annoy) makes the experience pitch credible.
Pricing by product
Compute per second (the canonical SKU)
| Resource | Per-second rate |
|---|---|
| Nvidia T4 (16GB) | $0.000164 |
| Nvidia L4 (24GB) | $0.000222 |
| Nvidia A10 (24GB) | $0.000306 |
| Nvidia L40S (48GB) | $0.000542 |
| Nvidia A100 (40GB) | $0.000583 |
| Nvidia A100 (80GB) | $0.000694 |
| Nvidia RTX PRO 6000 | $0.000842 |
| Nvidia H100 (80GB) | $0.001097 |
| Nvidia H200 (141GB) | $0.001261 |
| Nvidia B200 (180GB) | $0.001736 |
| CPU (physical core) | $0.0000131 (min 0.125 cores) |
| Memory | $0.00000222/GiB |
Storage (persistent volumes)
| Resource | Rate |
|---|---|
| First 1 TiB/month | Free |
| Above 1 TiB | $0.09/GiB-month |
Plan tiers
| Tier | Monthly fee | Included credits | Seats | Container concurrency | GPU concurrency |
|---|---|---|---|---|---|
| Starter | $0 | $30/month | 3 | 100 | 10 |
| Team | $250 + compute | $100/month | Unlimited | 1,000 | 50 |
| Enterprise | Custom | Custom | Unlimited | Custom | Custom |
Sales motions across products: PLG / self-serve for Starter and Team; sales-led for Enterprise commits, VPC deployment, and marketplace billing. All prices accessed 2026-05-29 from modal.com/pricing.
Hidden costs : What Modal customers actually pay beyond the per-second rates
Archetype A: Solo developer on Starter running batch inference on H100
A solo developer running occasional H100 batch jobs (~30 minutes/day, scale-to-zero in between):
| Line item | Monthly cost |
|---|---|
| H100 compute (30 min/day × 30 × 60 × $0.001097) | $59 |
| CPU + memory overhead (1 core, 16 GiB, ~5% overhead) | $4 |
| Volume storage for model weights (50 GiB, under 1 TiB free) | $0 |
| Egress (small response payloads, negligible) | <$1 |
| Subtotal compute | $63 |
| Starter credit ($30/mo) | -$30 |
| Estimated total | ~$33/month |
The Starter $30 credit covers roughly half the bill for typical small batch workloads. Pure scale-to-zero on per-second granularity means a 5-minute idle period costs nothing — a meaningful advantage over per-minute or per-hour platforms.
Archetype B: Mid-market team on Team plan running mixed inference + training
A 20-person AI engineering team on Team plan running production inference (A100 80GB during business hours) plus weekly training (H100 cluster):
| Line item | Monthly cost |
|---|---|
| Team plan fee | $250 |
| A100 80GB inference (8h/day × 30 × 60 × 60 × $0.000694) | $360 |
| H100 training (4h/week × 4 × 60 × 60 × $0.001097) | $63 |
| CPU + memory for orchestration (~$20) | $20 |
| Storage (200 GiB models, under 1 TiB free) | $0 |
| Subtotal | $693 |
| Team credit ($100/mo) | -$100 |
| Estimated total | ~$593/month |
For sustained mid-market workloads, Team plan economics are reasonable — $250 plan fee covers seats and higher concurrency limits, $100 credit offsets ~13% of compute, and per-second billing keeps idle costs near zero. The fixed $250 plan fee is the main differentiator from pure pay-as-you-go competitors.
Want to estimate your own Modal bill? Use the Modal pricing calculator to model GPU seconds, CPU core-seconds, memory, and storage by workload profile.
Pricing evolution : Modal’s pricing history from Python serverless to multi-tier platform
Cadence
| Quarter | Price changes | Product / SKU additions | Notes |
|---|---|---|---|
| 2021 Q3 | 0 | 1 | Modal founded; closed beta |
| 2023 Q1 | 0 | 1 | Public beta + Series A ($16M) + per-second pricing |
| 2024 Q1 | 0 | 1 | Team plan launched at $250/mo + compute |
| 2024 Q2 | 1 | 0 | Series B + H100 per-second pricing published |
| 2025 Q1 | 0 | 1 | Persistent volumes + 1 TiB/mo free storage |
| 2025 Q3 | 0 | 1 | B200 + H200 + L40S per-second pricing added |
| 2025 Q4 | 0 | 0 | Startup credit program expanded to $10k |
| 2026 Q1 | 0 | 1 | AWS + GCP Marketplace billing for Enterprise |
Tracked range: 2021 Q3–2026 Q1. Quarters not listed above were verified stable (0 price changes, 0 SKU additions).
Notable changes
- 2023-02-15 — Public beta launch + Series A; established per-second CPU/GPU/memory billing as the canonical pricing model.
- 2024-01-22 — Team plan launched at $250/month + compute with $100/month in credits and higher concurrency limits.
- 2024-06-10 — Series B + H100 per-second pricing at $0.001097/sec ($3.95/hour) published as competitive differentiator.
- 2025-03-19 — Persistent volumes launched at $0.09/GiB-month with 1 TiB/month free; vertically integrated storage with compute.
- 2025-09-04 — B200, H200, L40S added at per-second rates; maintained per-second granularity across new generations.
- 2025-12-11 — Startup credit program expanded to $10,000 grants; most generous in the category.
- 2026-02-25 — AWS + GCP Marketplace billing added for Enterprise contracts.
What’s unique : Modal’s distinctive pricing mechanics
1. Per-second billing across every SKU (not just GPU). Modal bills GPU, CPU, and memory all per second — the finest billing granularity in any major cloud compute product. For bursty serverless workloads where containers spin up for 2–10 seconds and shut down, this granularity makes scale-to-zero economics genuinely cheap, in a way that per-minute or per-hour billing cannot match.
2. Python-decorator-as-deployment-API is the canonical interface. Most serverless platforms accept Docker images and Kubernetes manifests; Modal’s primary interface is @app.function(gpu="H100") in a Python file. For data scientists and ML engineers who write Python but not infrastructure-as-code, this is a meaningful developer-experience cost advantage — and the founder’s open-source pedigree (Luigi, Annoy) makes the experience pitch credible.
3. Subscription-plus-compute hybrid (Team $250/mo + compute). Most serverless GPU competitors offer pure pay-as-you-go. Modal’s Team plan adds a $250/month flat fee for seats, higher concurrency, Slack support, and $100 in compute credits. This subscription-plus-usage hybrid is unusual in the category and signals Modal’s bet that mid-market teams will pay for predictable plan economics even when usage is variable.
4. Generous startup credit program (up to $10k). Modal grants up to $10,000 in credits to qualifying startups and academic researchers — substantially more generous than competitors’ typical $30–$500 trial offers. This credit-led PLG strategy prioritizes long-term developer adoption over short-term revenue from evaluation traffic.
5. Vertical storage integration with free 1 TiB/month tier. Persistent volumes at $0.09/GiB-month with 1 TiB free per month absorb model-weight and dataset storage without forcing customers to S3. For ML workloads where checkpoints and datasets compose most storage, this bundling reduces vendor sprawl and simplifies developer workflow — at the cost of multi-cloud flexibility.
Strengths & weaknesses
| Strengths | Weaknesses |
|---|---|
| Finest billing granularity in cloud compute (per second, every SKU) | Network egress and function invocation charges not itemized on pricing page |
| Python-decorator-as-deployment-API is the most opinionated DX in serverless GPU | Python-only orientation limits TAM for teams using Go, Rust, or polyglot stacks |
| Founder credibility from Luigi + Annoy open-source pedigree | Team plan $250/mo + compute is higher than pure pay-as-you-go competitors |
| Startup credit program (up to $10k) among most generous in category | Storage tier ($0.09/GiB-month) above free 1 TiB is roughly 4× S3 Standard pricing |
| Subscription-plus-usage hybrid (Team) captures predictable mid-market spend | Cold-start latency on infrequently-used containers (~3–8 seconds) |
| Free 1 TiB/month storage absorbs ML model weights without S3 | Concurrency limits gate scale-up — exceeding requires plan upgrade or commit |
Billing UX : Modal’s account controls and payment experience
- Self-serve signup — Sign up at
modal.com/signupwith email or GitHub; Starter tier $30 credits applied automatically. No credit card required to deploy first function. - Real-time per-second cost meter — Console shows per-function per-container cost in real time; each task has a live cost projection.
- Workspace and project organization — Workspace-level usage aggregation; per-environment (dev / staging / prod) separation supported.
- Spend alerts and caps — Configurable email and webhook alerts at $X spend per period; hard spend caps available on credit-card billing.
- Payment methods — Credit card and ACH on Starter and Team; wire transfer, invoice billing, and AWS/GCP Marketplace on Enterprise.
- Credit grants — Startup and academic credit grants up to $10k applied to workspace balance; offset against compute, storage, and Team plan fees.
- Annual commit pricing — Enterprise customers receive volume discounts in exchange for annual usage commitments and dedicated capacity reservations.
- Audit logging + RBAC — Workspace-level RBAC on Team+; SOC 2 audit-log exports on Enterprise.
- Multi-region availability — US standard; EU and APAC regions on Enterprise with VPC deployment.
Strategic wins : Why Modal’s pricing decisions worked
1. Per-second billing granularity as the structural cost moat
By billing GPU, CPU, and memory per second rather than per minute or hour, Modal made scale-to-zero economics genuinely cheap — and forced competitors to either match the granularity or accept the comparison gap. For bursty serverless workloads, the unit-economics advantage is real and reproducible. Most legacy cloud platforms cannot match per-second billing without re-architecting their metering infrastructure.
2. Python-decorator-as-deployment-API removed the YAML / Docker friction barrier
Most serverless platforms require Dockerfile + Kubernetes manifests; Modal’s @app.function() decorator hides the infrastructure entirely. For data scientists and ML engineers — Modal’s primary target persona — this is a meaningful productivity gain. The fact that Erik Bernhardsson created Luigi and Annoy makes the developer-experience promise credible in a way that pure-marketing positioning cannot replicate.
3. Generous startup credit program as the developer-led GTM engine
The up-to-$10k startup credit grants signal Modal’s bet on long-term developer adoption over short-term revenue from evaluation traffic. AI-native startups that build their initial product on Modal-funded compute tend to stay on Modal at scale — a classic PLG flywheel that competitors with smaller trial credits cannot match.
4. Team plan ($250/mo + compute) captures predictable mid-market revenue
Most serverless GPU competitors are pure pay-as-you-go, leaving predictable revenue uncaptured. The Team plan flat fee covers seats, concurrency, and Slack support — capturing customers who value predictability even at variable usage. This hybrid pricing is also a leading indicator of mature mid-market product-market fit.
Areas to improve : Gaps in Modal’s pricing approach
1. Egress and function-invocation charges are not itemized
The pricing page lists compute, memory, and storage rates but not network egress or per-function-invocation charges. For high-volume customers, this opacity creates bill-shock risk and erodes the per-second transparency advantage. Publishing egress rates (even at a “first X GB/month free, then $Y/GB” structure) would close the transparency gap versus AWS, GCP, and Azure rate cards.
2. Python-only orientation limits TAM
The @app.function() decorator pattern is Python-only. Teams using Go, Rust, or polyglot stacks cannot use Modal as their primary serverless platform. Adding a generic container interface (with Docker images as input, similar to RunPod and Replicate) would broaden TAM without abandoning the Python-native DX as the canonical path.
3. Storage above 1 TiB is expensive relative to S3
At $0.09/GiB-month, Modal’s storage tier is roughly 4× S3 Standard. For ML workloads with multi-TiB dataset storage, customers will need to either offload to S3 (introducing the vendor sprawl Modal’s free 1 TiB was designed to prevent) or accept significantly higher storage spend. A tiered storage rate (cheaper for archival, premium for hot) would close the cost gap.
4. Concurrency limits create scale-up friction
Starter caps at 100 container concurrency / 10 GPU concurrency; Team at 1,000 / 50. Customers hitting these limits during burst events face either degraded service or a plan-upgrade scramble. Soft caps with overage billing (rather than hard caps with service throttling) would smooth the scale-up experience and reduce churn at the boundary.
Key takeaways
-
Per-second billing granularity is a structural cost moat that legacy clouds cannot match. Modal’s decision to bill GPU, CPU, and memory per second forced a category-wide expectation shift — competitors now have to either match or justify per-minute / per-hour billing. For usage-based platforms where scale-to-zero is a marketing claim, billing granularity must match the claim.
-
Founder open-source pedigree is the most credible developer-experience trust anchor. Erik Bernhardsson’s authorship of Luigi and Annoy gives the Python-native DX promise weight that pure-engineering teams cannot replicate. Infrastructure commercializations targeting developer audiences should consider founder open-source visibility as a structural advantage.
-
Subscription-plus-usage hybrids capture predictable mid-market revenue. The Team plan ($250/mo + compute) is unusual among serverless GPU competitors and likely reflects mature mid-market product-market fit. Other usage-based platforms considering subscription components should look at Modal’s Team economics as a reference.
-
Generous startup credit programs are PLG levers, not lost revenue. Modal’s up-to-$10k credit grants signal long-term thinking: AI-native startups that build on Modal-funded compute tend to stay at scale. Trial credit generosity is a strategic GTM choice, not just a marketing line item.
-
Vertical bundling (storage with compute) simplifies developer experience at the cost of multi-cloud flexibility. Modal’s free 1 TiB/month storage absorbs ML model weights and reduces S3 dependence — a developer-experience win that comes at the cost of locking customers into Modal’s storage economics. For value-metric design, the bundling trade-off is structural.
UBP implications
-
Per-second billing is becoming the new transparency floor for serverless compute. Per-minute or per-hour billing is increasingly perceived as legacy-cloud opacity. New usage-based platforms entering serverless GPU should default to per-second granularity as a baseline expectation.
-
Subscription-plus-usage hybrids capture revenue that pure pay-as-you-go leaves on the table. Mid-market customers value predictable plan economics even when usage is variable; the Modal Team plan demonstrates the segment exists. Other usage-based platforms should consider hybrid SKUs for the predictable-spend mid-market.
-
Developer-experience as a pricing-adjacent value metric. Modal’s Python-decorator API is not a pricing mechanic, but it shapes which customers can use the platform — and which can therefore become paying customers. For value-metric pricing, the developer-experience surface area is as load-bearing as the rate card.
Sources
- Modal pricing page (accessed 2026-05-30)
- Modal docs — billing (accessed 2026-05-30)
- Modal blog — Series B announcement (accessed 2026-05-29)
- Modal startup credit program (accessed 2026-05-29)
- Erik Bernhardsson — Luigi and Annoy author (accessed 2026-05-29)
- Related infra blueprint — Baseten
- Related infra blueprint — Replicate
- Blueprint corpus index
Bottom line
Modal priced its serverless compute platform around three structural ideas: per-second billing granularity (finest in the category, GPU + CPU + memory) that makes scale-to-zero economics genuinely cheap, Python-decorator-as-deployment-API (@app.function(gpu="H100")) as the canonical interface for data scientists and ML engineers who write Python but not YAML, and founder credibility from Erik Bernhardsson’s open-source pedigree (Luigi, Annoy) that makes the developer-experience promise believable. The subscription-plus-usage hybrid (Team plan at $250/mo + compute) captures predictable mid-market revenue, and the up-to-$10k startup credit grant program drives long-term developer adoption.
For AI-native engineering teams that prioritize Python developer experience and per-second cost transparency over multi-cloud flexibility, Modal is the most ergonomic serverless GPU platform on the market. The remaining gaps (egress not itemized, Python-only orientation, storage above 1 TiB expensive, concurrency hard-caps) are TAM-expansion and transparency-polish problems rather than structural pricing flaws.
Compare with peers via the blueprint corpus, or model your own spend with the Modal pricing calculator.
Pricing timeline : Major events on a vertical axis
Each milestone below corresponds to a public pricing change, product launch, or material adjustment. Major events use a filled marker; minor adjustments use a faded one.
AWS + GCP Marketplace Billing for Enterprise
Modal added support for AWS Marketplace and GCP Marketplace billing on Enterprise contracts, letting customers consume Modal spend through existing cloud-provider committed-spend agreements. Reduces procurement friction for Fortune-500 buyers with EDP or CUD agreements.
Startup Credit Program Expanded to $10k
Modal expanded its startup credit grant program to up to $10,000 in credits for qualifying companies and academic researchers. Established Modal as the most generous credit-grant program among serverless GPU competitors, reflecting developer-led GTM emphasis.
B200 + H200 + L40S Added at Per-Second Rates
Modal added NVIDIA B200 (180GB) at $0.001736/sec, H200 (141GB) at $0.001261/sec, and L40S (48GB) at $0.000542/sec. The Blackwell-class addition came alongside continued per-second billing rather than per-hour, preserving granularity advantage at the high end.
Persistent Volumes + 1 TiB/mo Free Storage
Modal launched persistent volumes at $0.09/GiB-month with 1 TiB/month free. The free tier absorbed model-weight storage for most production workloads, vertically integrating storage with compute. Eliminated S3-dependence for many use cases.
Series B + H100 Per-Second Pricing
Modal raised a $31M+ Series B (reportedly led by Lux Capital with Redpoint and others) and published H100 per-second pricing at $0.001097/sec — roughly $3.95/hour, undercutting major competitors' on-demand H100 rates. Per-second granularity emphasized as a competitive differentiator.
Team Plan Launched at $250/mo + Compute
Modal introduced its Team plan at $250/month plus compute, with $100/month in included credits. The plan added unlimited workspace seats, 1,000 container concurrency, and 50 GPU concurrency — bridging the gap between free Starter and quote-based Enterprise tiers.
Public Beta + Series A ($16M)
Modal went into public beta and raised a $16M Series A led by Redpoint Ventures. Pricing model established: per-second CPU, GPU, and memory billing with a free tier of $30/month credits. Positioned as the developer-experience leader in serverless compute.
Modal Founded
Erik Bernhardsson (creator of Luigi and Annoy, ex-Spotify ML) and Akshat Bubna founded Modal Labs to build serverless infrastructure for Python data and ML workloads. Initial closed beta launched with the @app.function() decorator pattern as the canonical interface.
- · Modal's per-second billing granularity ($0.001097/sec on H100) means a 5-second cold start costs $0.0055 — among the finest billing granularity in any cloud compute product, two orders of magnitude finer than the per-minute industry standard.
- · Modal was founded in 2021 by Erik Bernhardsson (ex-Spotify ML, creator of Luigi and Annoy) — making it one of the few infrastructure platforms where the founder is the primary author of widely-deployed open-source data tooling, lending unusual credibility to the developer-experience pitch.
- · Modal grants up to $10,000 in credits to qualifying startups and academic researchers — one of the most generous credit programs in cloud compute, reflecting the founder's bet on developer-led GTM rather than enterprise sales-led growth.
Questions & answers
- How much does Modal cost per month?
- Modal's Starter tier is free with $30/month in compute credits, plus pay-as-you-go above that. Team plan is $250/month + compute (with $100/month in credits). A typical mid-volume workload running an A100 80GB ($0.000694/sec) for 4 hours/day would cost about $300/month in compute on top of the plan fee.
- What GPU rates does Modal publish?
- Modal publishes per-second rates: T4 $0.000164, L4 $0.000222, A10 $0.000306, L40S $0.000542, A100 40GB $0.000583, A100 80GB $0.000694, RTX PRO 6000 $0.000842, H100 $0.001097, H200 $0.001261, B200 $0.001736. All bill per second of active use — idle containers cost only memory and storage.
- What does Modal include in the free Starter tier?
- Starter is free with $30/month in compute credits, 3 workspace seats, 100 container concurrency, and 10 GPU concurrency. Credits cover roughly 27 hours of T4 or 7.5 hours of H100. Sufficient for evaluation and small-scale production; no credit card required.
- How is Modal's CPU and memory billed?
- CPU is $0.0000131 per physical core per second (minimum 0.125 cores per task); memory is $0.00000222 per GiB per second. A typical lightweight function using 1 vCPU and 4 GiB for 60 seconds costs about $0.0008. Persistent volume storage is $0.09/GiB-month with 1 TiB/month free.
- Does Modal offer startup credits?
- Yes — qualifying startups and academic researchers can receive up to $10,000 in Modal credits. Application is via Modal's startup program; eligibility typically requires early-stage status, ML/AI focus, or academic affiliation. Among the most generous credit grants in serverless GPU.
- How does Modal Team plan differ from Enterprise?
- Team ($250/mo + compute, $100/mo credits) offers unlimited seats, 1,000 container concurrency, 50 GPU concurrency, and standard compliance (SOC 2). Enterprise is quote-based and adds custom volume discounts, dedicated capacity reservations, advanced security (audit logs, RBAC), VPC deployment, and marketplace billing via AWS/GCP.