Foundation Models vs. Vertical Fine-Tuning

Harvey built $195M ARR by fine-tuning on legal data. Learn when vertical fine-tuning beats foundation APIs and what billing infrastructure each approach requires.

Abhilash John

Oct 30, 2025 · updated Apr 15, 2026 · 31 min read

Foundation Models vs. Vertical Fine-Tuning

AI Summary

The foundation model vs. vertical fine-tuning decision is not primarily a build vs. buy decision — it's the AI equivalent of a competitive moat decision: foundation model APIs give every competitor equal access to the same capabilities, while proprietary fine-tuned models create differentiation that cannot be replicated without accessing the same training data.
Harvey is the canonical vertical fine-tuning success case: $195M ARR, 4x growth, $1,200/lawyer/month pricing — achieved by training legal-specific models on proprietary case law and contract corpora, producing 40–60% performance improvement on legal tasks vs. foundation models and justifying a 10x+ premium over generic AI API costs.
The vertical fine-tuning break-even calculus: at low query volumes, API costs are lower than amortized training + inference costs; at 5–10M+ tokens/month for premium models or 50–100M+ tokens/month for budget models, self-hosted fine-tuned models become more economical — with some deployments achieving 99.98% cost reduction vs. GPT-4 API.
Amortized training cost creates a fundamentally different billing infrastructure requirement: unlike API tokens (pure variable cost), vertical model costs are dominated by upfront training (capital expenditure that must be depreciated over expected model lifetime and query volume) — requiring financial systems that can handle AI model asset depreciation, not just consumption metering.
The hybrid architecture (foundation for breadth + vertical for high-volume core workflows) is the dominant mature pattern: it requires dual billing infrastructure that tracks GPU-hour capacity costs for self-hosted fine-tuned models alongside token-based API costs for foundation model usage, with intelligent routing that determines which infrastructure handles each query.
Vertical foundation models (domain-specific models sold via API, like Harvey's legal model) are the emerging middle ground: they deliver 80% of fine-tuning performance with 0% of training investment from the customer's side — changing the build vs. buy calculus for vertical AI product companies who can now access specialized models without bearing the training cost.

The Build-Versus-Buy Decision That Defines Your Economics

A chief product officer at a healthcare technology company is facing a decision that will determine whether her company can sustainably monetize AI. Her team wants to launch an AI clinical documentation assistant that helps physicians generate patient notes from consultation recordings. The engineering team presents two architecture approaches. The first approach uses GPT-5 via API with carefully crafted prompts and retrieval-augmented generation to pull in clinical guidelines. Development would take two months, cost structure would be straightforward at roughly four dollars per consultation in API fees (see the AI token pricing tracker for current model pricing), and quality would be good enough for eighty to ninety percent of cases. The second approach involves fine-tuning a specialized medical language model on their proprietary dataset of five hundred thousand physician-patient interactions, clinical workflows, and hospital-specific templates. Development would take six months and cost four hundred thousand dollars upfront for data preparation, model training, and validation. But once deployed, per-consultation costs would drop to forty cents, quality would be demonstrably superior on medical-specific tasks, and the model would understand their customers’ unique workflows in ways generic models never could.

The CFO wants the fast, low-upfront-cost approach. The engineering director argues for the custom model. The chief product officer has to decide, and the implications cascade through every dimension of the business. Pricing strategy will be completely different depending on which path they choose. Customer contracts will need different terms to account for either variable API costs or amortized training investments. Billing infrastructure will need to handle either token-based consumption or specialized service pricing. And critically, the competitive moat they can build depends entirely on whether they own differentiated AI or whether they’re just a wrapper around the same models their competitors can access.

This scenario is playing out across industries right now as companies confront the fundamental tradeoff between foundation models and vertical fine-tuned deployments. The choice isn’t just technical. It’s a strategic decision about business model, defensibility, and long-term unit economics. And each path requires fundamentally different billing infrastructure and pricing approaches. This article examines what’s actually happening in the market, why companies are choosing each approach, what it means for billing and pricing, and where the industry is heading as both foundation models and vertical specialization mature.

Understanding Foundation Models: The General Intelligence Layer

Before we can meaningfully compare approaches, we need to understand what foundation models are and why they’ve become so dominant so quickly. Foundation models are large-scale AI systems trained on massive diverse datasets to perform a broad range of tasks across many domains. GPT-5, Claude 4.5, Gemini 3, Llama 4 — these are all foundation models. They’re called “foundation” because they’re designed to serve as the base layer that can be adapted for countless specific applications through prompting, fine-tuning, or other techniques.

The defining characteristic of foundation models is their generality. They’re trained on text from across the internet, books, code repositories, scientific papers, social media, and countless other sources, giving them broad knowledge about language, reasoning, facts, and relationships. This breadth means they can tackle diverse tasks without modification. You can use the same GPT-5 model for writing marketing copy, analyzing financial documents, generating code, summarizing research papers, or answering customer support queries. The model doesn’t need to be retrained or specialized for each use case. You just provide different prompts or context.

The economic advantage of this generality is enormous. When OpenAI, Anthropic, or Google trains a foundation model, they’re investing hundreds of millions or billions of dollars in compute, data, and engineering. But that investment amortizes across millions of users and thousands of different applications. A single Claude 4.5 Opus deployment serves legal teams drafting contracts, engineers debugging code, marketers writing campaigns, researchers analyzing data, and countless other use cases simultaneously. The per-user cost of model development becomes negligible when spread across such massive scale.

For companies consuming foundation models via APIs, the value proposition is compelling. You get access to cutting-edge AI capabilities without any upfront investment in training infrastructure or data collection. Integration is typically straightforward, often just HTTP API calls with text prompts and responses. Time to market is measured in weeks, not months or years. And you benefit from continuous improvements as model providers release new versions with better capabilities, which you can access by just changing an API endpoint or model name.

The current state of foundation models as of early 2026 shows remarkable convergence on core capabilities alongside fierce competition on price and speed. Industry analysis reveals that the gap between available large language models is extremely small, with marginal differences between proprietary models in niches like creative writing or coding, but overall differences diminishing over time. The gap between open-source and proprietary models is rapidly closing, with models like DeepSeek R1 matching GPT-5 performance on many benchmarks while being dramatically cheaper to run.

This commoditization creates pressure on pricing. According to Stanford’s 2025 AI Index Report, the cost to use GPT-3.5-level capabilities has fallen two hundred eighty-times from 2023 to 2025, from roughly twelve dollars per million tokens to under two dollars for comparable performance. Google’s Gemini 3 Flash achieves frontier model quality at fifty cents per million input tokens and three dollars per million output tokens, establishing new cost-efficiency benchmarks. The rapid price deflation we discussed in Part 3 of this series is accelerating as foundation model providers compete aggressively for market share.

For application layer companies building on foundation models, this pricing dynamic cuts both ways. Falling API costs directly improve your margins if you can maintain customer pricing. But the commoditization makes differentiation harder because your competitors have access to the same models at the same prices. Your competitive advantage can’t come from the AI itself, it has to come from your data, your workflows, your integrations, or your go-to-market execution. This lack of technical differentiation is driving many companies to consider vertical specialization as an alternative or complement to generic foundation models.

The billing infrastructure for foundation model consumption is relatively straightforward because it maps directly to the API providers’ metering and pricing. Use the OpenAI pricing calculator to build your provider cost baseline before adding markup. You track tokens consumed by feature and by customer, you apply the provider’s rate card, you add your markup, and you bill customers. The complexity we’ve discussed in previous articles about multi-model orchestration, reasoning versus inference, and feature-level attribution all apply, but the fundamental unit of measure, tokens or API calls, is established and well-understood. The infrastructure challenge is handling the volume of events and providing analytics, not inventing new billing paradigms.

Foundation models will continue improving while becoming cheaper and more accessible. Training costs for frontier models are projected to reach one billion dollars or more by 2027, but those costs amortize across such massive user bases that per-token costs keep falling. The models are also becoming more modular and customizable through techniques like prompt caching that we discussed in Part 9, making them easier to adapt to specific use cases without true fine-tuning. For many applications, foundation models accessed via API will remain the optimal choice because the economics and convenience are so compelling.

The Case for Vertical Fine-Tuning: Specialization as Strategy

Despite the advantages of foundation models, companies are increasingly investing in domain-specific fine-tuned models that are optimized for particular industries, workflows, or use cases. Vertical fine-tuning represents a fundamentally different economic and strategic approach where upfront investment replaces variable consumption costs, and competitive moat comes from model differentiation rather than application layer innovation.

The canonical example of vertical fine-tuning success is Harvey, the legal AI platform that has become one of the most valuable enterprise AI companies by focusing exclusively on legal and professional services. Harvey takes foundation models from OpenAI and extensively fine-tunes them on vast amounts of legal documents, case law, contracts, and specialized content. The result is a model that understands legal terminology, reasoning patterns, document structures, and workflow requirements in ways that generic foundation models don’t despite their broad capabilities.

The performance difference is substantial. According to research from various deployments, fine-tuned models deliver forty to sixty percent performance improvements on domain-specific tasks compared to general models. A seven billion parameter legal model fine-tuned on contracts can achieve ninety-four percent accuracy versus GPT-5’s eighty-seven percent on the same contract analysis task according to production deployment data. These aren’t marginal improvements. They represent the difference between AI that’s useful and AI that’s transformative for professional workflows.

The economics of Harvey illustrate the vertical fine-tuning business model clearly. As of 2025, Harvey reached one hundred ninety-five million dollars in annual recurring revenue, growing nearly four-times from fifty million at the end of 2024. The company serves over one thousand enterprise customers across sixty countries, with pricing starting at twelve hundred dollars per lawyer per month with twelve-month commitments and roughly twenty-seat minimums. Customer data shows that median seat count doubles within twelve months of deployment, demonstrating strong expansion revenue.

This pricing model, twelve hundred dollars per seat monthly, is dramatically different from how foundation model APIs are priced. Harvey isn’t charging based on tokens consumed or queries processed. They’re charging based on the value delivered to professional knowledge workers, with pricing that reflects the economics of replacing human labor rather than computational resources. A lawyer billing five hundred dollars per hour who saves two to three hours weekly through Harvey creates fifteen hundred dollars in weekly value, making twelve hundred dollars monthly seem cheap. The pricing is value-based, not cost-based.

But behind this value-based pricing is a substantial cost structure that differs from API-based products. Harvey raised over one billion dollars across multiple funding rounds in 2025 alone, with much of that capital going toward model training, data acquisition, customer success, and the forward-deployed teams of ex-lawyers who drive implementation and adoption at law firms. The company dedicates roughly ten percent of its team to customer success roles specifically because vertical AI requires intensive change management to achieve deep workflow integration.

The upfront investment required for vertical fine-tuning is substantial even for companies not aiming to become Harvey. Data preparation alone can cost hundreds of thousands of dollars for smaller projects. You need domain experts to curate and annotate training data. You need machine learning engineers to manage the fine-tuning process, which is more complex than simply calling an API. You need validation processes to ensure the fine-tuned model actually performs better than foundation models on your specific tasks. And you need ongoing maintenance as language evolves, regulations change, or new edge cases emerge.

But the per-inference costs tell a different story. Once a model is fine-tuned and deployed, the marginal cost of running it is dramatically lower than calling foundation model APIs for the same capabilities. Production data shows that serving a seven billion parameter fine-tuned model costs ten to thirty-times less than running a seventy to one hundred seventy-five billion parameter foundation model for comparable workloads on domain-specific tasks. Some enterprise deployments report ninety-nine point ninety-eight percent cost reduction by migrating from GPT-4 API usage at four point two million dollars annually to self-hosted fine-tuned models at under one thousand dollars annually.

These economics create a clear breakpoint. At low volumes, foundation model APIs are cheaper because you’re not bearing upfront training costs and you only pay for what you use. But at high volumes, the fixed costs of fine-tuning amortize across enough queries that the lower per-query inference costs make vertical models dramatically more economical. For Harvey’s use case, processing millions of legal queries monthly, the savings from running fine-tuned models versus calling GPT-5 for every query likely reach tens of millions of dollars annually.

The strategic advantages of vertical fine-tuning extend beyond just cost savings at scale. Custom models trained on proprietary data create defensible market positions that generic API wrappers can’t replicate. According to Boston Consulting Group research, companies that successfully customize foundation models report two to three-times ROI compared to those using generic implementations. The differentiation comes from the model understanding your specific domain, workflows, terminology, and customer needs in ways that competitors can’t easily copy.

The privacy and compliance advantages matter particularly in regulated industries. When you fine-tune your own model on data that never leaves your infrastructure, you can avoid the complexities of sending sensitive information to external API providers. Healthcare companies can fine-tune on patient data without HIPAA concerns. Financial institutions can train on confidential transactions without regulatory headaches. Government agencies can work with classified information without security risks. This control over data and processing is often non-negotiable for high-sensitivity use cases.

The current market momentum toward vertical fine-tuning is visible in investment flows. According to Menlo Ventures’ 2025 State of AI report, vertical AI companies targeting specific industries captured three point five billion dollars in enterprise spending in 2025, growing rapidly as companies realize that generic copilots don’t address their specialized needs. Legal AI, healthcare AI, financial AI, each vertical is seeing well-funded companies building specialized models rather than thin wrappers around foundation APIs.

But vertical fine-tuning isn’t without challenges. The technical complexity is significantly higher than using APIs. You need machine learning expertise that most product teams don’t have. The time to market is longer, measured in quarters rather than weeks. The risk is higher because your training investment might not produce the performance improvements you expected. And the ongoing maintenance burden is real, as models need retraining when underlying data distributions shift or when new foundation model versions provide better base capabilities to fine-tune from.

The sweet spot for vertical fine-tuning is emerging. It makes sense for companies with proprietary domain datasets, high query volumes where cost savings justify upfront investment, regulated industries where data control matters, and competitive contexts where technical differentiation creates defensible value. For companies outside this sweet spot, foundation models via APIs remain the better choice. And increasingly, we’re seeing hybrid approaches that use both in complementary ways.

The Hybrid Reality: Combining Foundation and Vertical Models

The emerging industry pattern isn’t a binary choice between foundation models and vertical fine-tuning. Sophisticated companies are adopting hybrid architectures that use foundation models for breadth and experimentation while deploying vertical models for high-volume, mission-critical workflows where specialization and cost efficiency matter. Understanding this hybrid approach is essential for building billing infrastructure that needs to support both economic models simultaneously.

The hybrid strategy typically follows a maturation path that companies traverse as they scale. Early stage companies start exclusively with foundation model APIs because that’s the fastest path to market with minimal upfront investment. You can ship AI features in weeks by integrating OpenAI or Anthropic APIs, validating product-market fit before committing to heavy infrastructure investment. This makes sense when you’re still figuring out which AI capabilities customers actually value and how they’ll use them.

As usage grows and patterns stabilize, companies identify specific high-volume workflows where vertical fine-tuning would materially improve economics. These are typically routine, repetitive tasks that generate millions of queries monthly and where the task definition is clear enough that fine-tuning can be validated. A customer support platform might fine-tune a model specifically for classifying support tickets by urgency and routing them appropriately, while continuing to use foundation models for generating actual response text. The classification task is high-volume and constrained enough for effective fine-tuning, while response generation benefits from the breadth of foundation models.

At maturity, companies run multiple specialized models alongside foundation model access, routing each query to whichever infrastructure optimizes for the specific task’s requirements. Cursor exemplifies this with their Composer coding model deployed on their own infrastructure for core autocomplete and refactoring tasks, while also integrating OpenAI, Anthropic, and Google models via APIs for capabilities they don’t want to build themselves. This multi-model architecture lets them optimize each capability dimension independently, vertical models for cost and control, foundation models for breadth and flexibility.

The billing complexity this creates is substantial. Your infrastructure needs to track which queries went to fine-tuned models versus foundation model APIs. You need to amortize the upfront training costs of vertical models across their expected usage, incorporating depreciation into your cost accounting as the models age and need retraining. You need to track ongoing maintenance costs like data refresh, retraining cycles, and validation testing. And you need to attribute all of this to features and customers in ways that inform pricing decisions.

The cost attribution challenge is particularly acute for hybrid deployments because the cost structure differs so dramatically across model types. Foundation model costs are variable and directly tied to usage volume, making them easy to attribute per query. Vertical model costs are heavily weighted toward upfront training with minimal marginal costs, making per-query attribution depend on assumptions about total usage volume over the model’s lifetime. If you trained a model expecting it to serve ten million queries before needing replacement, but actual usage is only five million, your per-query costs are double what you budgeted.

Real companies managing hybrid deployments report that cost allocation methodologies become critical financial management tools. They typically use a combination of direct attribution for variable costs like API calls and allocation rules for fixed costs like model training. The allocation might be based on relative query volumes across features, or on estimates of which features benefit most from the specialized models, or on simple equal distribution. The key is having a consistent methodology that stakeholders understand and that provides reasonable approximations of economic reality.

The pricing strategy for hybrid deployments needs to reflect the blended economics without exposing implementation complexity to customers. Most companies choose to maintain simplified customer-facing pricing based on seats, usage tiers, or outcomes, while using the detailed cost attribution internally to validate that pricing is sustainable. Harvey charges twelve hundred dollars per lawyer per month regardless of whether queries go to their fine-tuned legal models or to foundation models for edge cases. The unified pricing abstracts away the backend complexity.

But the internal analytics become crucial. Harvey’s finance team needs to know what percentage of queries are handled by fine-tuned models versus escalated to more expensive foundation models, because that ratio determines their actual unit economics and margin profile. If fine-tuned models handle ninety-five percent of queries, margins are healthy. If only seventy percent of queries can be handled by fine-tuned models because customer usage patterns differ from training assumptions, margins compress and pricing might need adjustment.

The infrastructure investment required to support hybrid deployments is significant. The guide to aggregation methods in usage-based billing covers how to unify token-based and capacity-based cost streams into a single customer view. You need the token-based metering and billing systems for foundation model APIs that we’ve discussed throughout this series. You need capacity-based tracking for self-hosted fine-tuned models to understand utilization and right-size your deployment. You need query routing logic that decides which model to use for each request based on task characteristics, cost constraints, and quality requirements. You need monitoring that validates fine-tuned models are actually performing better than foundation models for their specialized tasks, because if they’re not, the extra complexity isn’t justified.

Companies at the leading edge of hybrid deployment are building model management platforms that treat models as a portfolio of assets with different cost structures, capabilities, and suitable applications. The platform includes cost tracking for each model, performance benchmarks showing where each excels, routing rules that optimize for cost-quality tradeoffs, and analytics that inform decisions about when to fine-tune new vertical models versus expanding foundation model usage. This portfolio approach to model management is becoming a competitive differentiator as AI becomes core infrastructure.

Hybrid deployments will become the norm for any company with significant AI spend and differentiation requirements. Pure foundation model strategies will remain viable for companies that don’t have enough volume to justify vertical investment or that operate in domains where general capabilities suffice. Pure vertical strategies will remain rare, limited to companies like Harvey that are deeply specialized in single domains. But the middle ground of selective vertical specialization layered on top of foundation model baseline is where most mature AI products will land. And the billing infrastructure needs to support this complexity.

The Billing Infrastructure Split: Tokens Versus Training

The specific billing infrastructure requirements that emerge from the foundation versus vertical model choice present a real challenge: these two approaches have completely different cost drivers that require different metering, allocation, and pricing approaches, yet customers often consume both through the same product and expect coherent, unified billing.

For foundation model consumption through APIs, the billing infrastructure mirrors what we’ve discussed throughout this series. You meter tokens consumed, you track which features and customers drove that consumption, you apply provider rate cards with your markup, you aggregate into invoices. The complexity is in volume, multi-provider orchestration, reasoning versus inference tiers, and feature-level attribution. But the fundamental unit, the token or API call, is well-defined and easy to explain to customers.

The challenges with foundation model billing are around handling the rapid price changes as providers compete and costs deflate. Your billing system needs rate cards that can be updated frequently as OpenAI, Anthropic, Google, and others adjust pricing. You need contract terms that give you flexibility to pass through price decreases or increases without requiring customer renegotiation. And you need customer communication strategies that explain why prices change, particularly when increases happen, which they occasionally do for premium capabilities even as baseline costs fall.

For vertical fine-tuned models, the billing requirements are different because the cost structure is different. The dominant costs are upfront training and ongoing maintenance, not per-query inference. Your billing infrastructure needs to support amortization methodologies that spread training costs across expected usage volumes and model lifetimes. This is closer to how traditional manufacturing companies depreciate capital equipment than to how software companies bill for cloud services.

The amortization calculation requires assumptions that have significant financial impact. If you spend four hundred thousand dollars training a specialized model and expect it to serve ten million queries over an eighteen-month lifetime before needing replacement, your amortized training cost per query is four cents. But if actual usage is twenty million queries, training cost per query drops to two cents, improving your margin. If usage is only five million queries because adoption was slower than expected, your cost per query doubles to eight cents, potentially making the model uneconomical.

This uncertainty creates risk that needs to be managed in how you price and contract for services powered by vertical models. Some companies handle this by charging upfront deployment fees that recover a significant portion of training costs immediately, then charging smaller recurring fees for ongoing usage. This shifts some risk to customers but aligns with the cost structure. Other companies absorb the training costs as product development expenses and recover them over time through subscription or usage pricing, accepting the risk that low adoption could make specific vertical models unprofitable.

The maintenance costs add another layer of complexity. Fine-tuned models don’t improve automatically the way foundation model APIs do when providers release new versions. You need to retrain periodically on updated data to maintain relevance. You need to validate that models aren’t degrading as edge cases accumulate. You need to fine-tune on new foundation model versions when they offer better base capabilities. These ongoing costs need to be tracked and allocated to features or customers benefiting from the vertical models.

For hybrid deployments running both foundation and vertical models, the billing infrastructure challenge is reconciling these different cost structures into coherent financial reporting and customer pricing. Your internal cost accounting needs to show total AI spend broken down by infrastructure type, foundation API costs versus vertical training and inference costs. You need feature-level P&L that includes allocated vertical model costs alongside direct foundation API costs. And you need margin analytics that show blended economics across hybrid stacks.

The customer-facing billing for hybrid deployments typically hides the complexity behind unified pricing models. Most companies use seat-based subscriptions, consumption-based credits, or outcome-based pricing that doesn’t distinguish between foundation and vertical models in what customers see. But behind the scenes, the billing system is tracking actual costs through different model types and validating that pricing covers blended costs with acceptable margins.

Some companies are experimenting with transparent pricing that reflects infrastructure differences. A product might offer different service tiers where basic tiers use foundation models via APIs with variable quality and cost, while premium tiers use fine-tuned vertical models with guaranteed performance and lower per-use costs at higher subscription fees. This transparency helps customers understand value tiers while aligning pricing with actual cost structure, but it requires customers to understand technical differences they may not care about.

The forecasting and budgeting implications differ dramatically between foundation and vertical approaches. Foundation model costs are relatively predictable based on usage projections and published rate cards, though provider price changes create some uncertainty. Vertical model costs have high upfront spikes for training followed by long periods of low incremental costs, making cash flow planning more complex. CFOs and finance teams need visibility into planned vertical model projects so they can budget the training costs appropriately rather than being surprised by large one-time expenses.

Traditional billing platforms like Stripe, Zuora, and Chargebee aren’t designed for the hybrid model cost accounting that AI companies need. They can handle token-based metering and subscription management, but they don’t natively support amortizing training costs, allocating maintenance expenses, or reconciling blended foundation and vertical economics. Specialized tools are emerging, but most companies are still building custom analytics on top of general billing platforms to get the visibility they need.

The long-term infrastructure requirement is a model asset management system that tracks each AI model you operate as a financial asset with acquisition costs, maintenance schedules, depreciation timelines, and performance metrics. This system needs to integrate with your billing platform to feed cost data for margin calculations, with your product analytics to show usage patterns that inform training investments, and with your engineering systems to track technical performance. Building this model asset management capability is becoming a strategic priority for companies running hybrid AI stacks at scale.

Looking Forward: The Convergence Trajectory

As we close this examination of foundation versus vertical models, let’s project forward to understand how this situation will evolve and what it means for billing infrastructure over the next several years. The trajectory suggests increasing sophistication in hybrid approaches alongside continued specialization in both foundation and vertical directions.

The first high-confidence prediction is the proliferation of domain-specific foundation models that sit between fully general models and narrow vertical fine-tuned models. We’re already seeing this with models like Harvey’s legal foundation model or specialized medical models like Med-PaLM. These domain models are trained on industry-specific corpora at scale, creating foundation-like breadth within a vertical. They can be used via API like foundation models but with performance closer to fine-tuned models on domain tasks.

This emergence of vertical foundations changes the build-versus-buy calculus. Instead of choosing between generic GPT-5 and fine-tuning your own model on proprietary data, you might choose a legal foundation model via API that gets you eighty percent of the way to specialized performance without any training investment. Then you can do lightweight fine-tuning on top of the vertical foundation for the remaining twenty percent that’s specific to your workflows. This layered approach optimizes development speed and cost while still capturing specialization benefits.

The billing for vertical foundation models will likely mirror foundation model APIs initially, charging based on tokens consumed but at potentially different rates than general models. Harvey might offer their legal foundation model via API at premium pricing to GPT-5 because it delivers better legal performance, capturing value from the specialized training they’ve done. Or they might price lower than GPT-5 to drive adoption while monetizing primarily through their full platform. The pricing strategy for vertical foundation APIs is still being discovered.

The second prediction is increasing automation of the fine-tuning process making vertical models more accessible to smaller companies that couldn’t previously justify the investment. Tools are emerging that can automatically curate training data, select appropriate base models, configure fine-tuning parameters, and validate performance, reducing the machine learning expertise required. As fine-tuning becomes more productized, the break-even point where it makes economic sense shifts lower, from millions of queries monthly to perhaps hundreds of thousands.

This democratization of fine-tuning will create new pricing models around fine-tuning-as-a-service. Companies like OpenAI and Anthropic already offer fine-tuning capabilities on top of their foundation models with pricing based on training tokens and inference scaling. As this market matures, we’ll see more sophisticated pricing that accounts for data volume, model size, performance guarantees, and exclusive versus shared fine-tuning. The billing infrastructure needs to support hybrid models where you’re paying both for foundation model API usage and for fine-tuning services layered on top.

The third prediction is that regulatory pressure in certain industries will increasingly mandate vertical specialization for compliance reasons. We’re already seeing this in healthcare where using general foundation models on patient data raises HIPAA concerns that specialized models designed for healthcare can address more cleanly. Financial services, legal, government, these regulated verticals will likely see requirements for AI systems to demonstrate domain-specific training, validation, and oversight that generic foundation models can’t easily satisfy.

This regulatory driver will create market opportunities for companies building vertical models specifically designed for compliance. These models will be priced at premium to foundation models, with pricing justified not just by better performance but by reduced compliance risk and simplified audit processes. The billing for compliance-focused vertical models might include certification fees, audit trail access charges, or regulatory reporting services bundled into the model pricing.

The fourth prediction is the emergence of model marketplaces where companies can buy, sell, or license fine-tuned models across organizational boundaries. Instead of every law firm fine-tuning their own contract analysis model, perhaps they license Harvey’s model or purchase a pre-trained model from a specialized provider. These marketplaces will create complex billing scenarios involving revenue sharing, usage tracking across licensees, and attribution when downstream users further fine-tune licensed models.

The marketplace billing infrastructure would need to handle multi-party revenue splits, track usage across different licensing tiers, manage exclusivity versus shared access, and potentially even automate pricing based on model performance benchmarks. This is similar to how software component marketplaces work, but with the added complexity of usage metering and performance validation for AI models. The vendors that build this marketplace infrastructure will capture significant value as intermediaries.

The fifth prediction ties everything together: the ultimate winner in the foundation versus vertical debate won’t be one approach or the other, but rather platforms that make it easy to use both in concert. Companies that can seamlessly blend foundation model APIs, vertical foundation models, company-specific fine-tuned models, and intelligent routing across this portfolio will have the most sustainable economics and best customer experience. The billing infrastructure that supports this blended approach becomes a strategic advantage rather than just a back-office capability.

Synthesis: Building Billing for Both Worlds

Concrete recommendations for how billing infrastructure should evolve to support both foundation model consumption and vertical fine-tuning, as companies increasingly adopt hybrid approaches, follow below. These recommendations reflect what leading companies are implementing and what will become necessary as the market matures.

The first essential investment is in cost tracking that can handle both variable costs from foundation model APIs and fixed costs from vertical model training with appropriate allocation methodologies. Your financial systems need to distinguish between these cost types and report them separately while also providing blended views that show total AI spend. This requires chart of accounts that segments foundation API costs, vertical training costs, vertical inference costs, and maintenance costs as distinct line items that can be aggregated for different reporting purposes.

The cost tracking should support what-if analysis where finance teams can model scenarios like: if we fine-tune a vertical model for Feature X, how much would we need to save on foundation API costs to justify the training investment, at what usage volume does vertical become more economical than foundation for this use case, if we retrain our vertical model quarterly versus annually, how does that affect total cost of ownership. These scenario models inform strategic decisions about where to invest in vertical specialization versus continuing with foundation APIs.

The second critical capability is amortization and depreciation systems that treat vertical model training as capital investments with defined useful lives and depreciation schedules. Your billing platform needs to support configurable amortization that can spread training costs over expected usage volumes or time periods with different schedules for different models. A model trained for a rapidly evolving domain might depreciate over twelve months, while a model for a stable domain might depreciate over thirty-six months.

The amortization should integrate with your cost attribution so that depreciation charges for vertical models flow into feature-level P&L just like foundation API costs do. This creates consistent cost visibility whether a feature uses foundation or vertical models, with all costs ultimately rolling up to feature profitability metrics that inform product decisions. The challenge is that depreciation based on usage requires tracking actual versus projected usage and potentially adjusting depreciation schedules as actuals diverge from projections.

The third essential investment is in model asset management systems that track all AI models you operate as a portfolio with associated costs, capabilities, performance metrics, and suitable applications. The model asset management system should include: model catalog showing all foundation APIs you access, all vertical models you’ve trained, and their key characteristics; cost tracking for training, maintenance, and inference for each model; performance benchmarks showing quality metrics on standard tasks; usage analytics showing query volumes and patterns for each model; routing rules that determine which model handles which query types.

This model portfolio view enables strategic optimization where you can evaluate whether to add new vertical models, retire underperforming ones, or shift more usage to cost-efficient options. The portfolio analytics should show metrics like cost per query by model, quality per dollar spent, and capacity utilization for self-hosted vertical models. These metrics inform decisions about model infrastructure investment and pricing strategy.

The fourth recommendation is implementing customer-facing pricing that abstracts complexity while maintaining sustainable margins across hybrid deployments. Most companies should adopt unified pricing models like seat-based subscriptions, consumption credits, or outcome-based charges that don’t expose backend infrastructure decisions to customers. But the pricing should be calibrated based on blended costs across foundation and vertical models, with margins that account for both variable API costs and amortized vertical training costs.

The pricing strategy should include guard rails that protect against adverse selection. If you’re charging flat rates but some customers drive much higher backend costs through usage patterns that trigger expensive foundation model calls rather than routing to cheap vertical models, you need mechanisms to identify and address this. This might mean usage caps, tiered pricing that charges more for high consumption, or active customer success engagement to help heavy users optimize their patterns.

The fifth recommendation is building transparency and analytics tools that show customers their usage patterns and associated value in ways that build trust and encourage optimization. Even if you’re not exposing backend infrastructure complexity in pricing, you can show customers metrics like queries processed, time saved, outcomes achieved, and value delivered. This transparency helps justify pricing and makes renewals easier because customers can see concrete impact.

For customers with sophisticated operations teams, you might offer more detailed analytics showing which of their use cases consume more AI resources and recommendations for optimization. This consultative approach to customer cost management builds stronger relationships and can increase revenue through expansion as customers optimize low-value usage to free budget for high-value use cases that justify higher pricing.

The final recommendation is treating billing infrastructure as a continuous investment that evolves with your AI strategy rather than as a one-time implementation. The foundation versus vertical situation is changing rapidly. New model types are emerging. Pricing from foundation providers is volatile. Regulatory requirements are evolving. Your billing infrastructure needs to be flexible enough to support new pricing models, new cost allocation methods, and new reporting requirements as the market and your business evolve.

This means avoiding tightly coupled billing logic hard-coded in application code. Instead, treat pricing rules, allocation methodologies, amortization schedules, and reporting dashboards as configuration that can be updated without engineering changes. Build APIs and integration points that make it easy to add new data sources as you adopt new model providers or fine-tune new vertical models. And invest in the analytics capabilities that let you understand your AI economics deeply enough to make informed strategic choices.

The companies that navigate the foundation versus vertical choice successfully won’t be those that pick one approach and stick with it rigidly. They’ll be those that maintain optionality through hybrid architectures, make data-driven decisions about where to specialize based on cost-benefit analysis, and build billing infrastructure sophisticated enough to support both consumption and capital-intensive AI while presenting clean, understandable pricing to customers. The billing infrastructure investment required is substantial, but it enables the strategic flexibility that will separate winners from losers as AI becomes core to every software company’s value proposition.

About This Series

The Future Ahead is a series exploring where the AI industry is heading and how it will fundamentally transform billing workflows, billing infrastructure, and pricing models.

Read Previous Articles:

AI Fine-Tuning Foundation Models Economics Strategy Vertical AI