What is it
Per-Request Pricing is a billing unit where customers are charged per request served — the generic meter for inference endpoints, search, scraping, and browser infrastructure. Whatever the product does — answer a search, scrape a page, run a model, write a memory — the request is the countable thing that crosses the API boundary, and the bill is request volume times a published rate.
The purest expression in the corpus was MultiOn, whose Agent API priced one web action as one request: $0.08 at public beta, cut to $0.04 (Basic) and $0.025 (Premium, behind a $50,000/year minimum) within weeks, with a $0.01 Retrieve rate for extraction-only calls. Linkup is the living equivalent — a web search API with no seats and no tiers, just $0.005 per standard search and $0.05 per deep search, plus an x402 mode that bills a flat $0.01 per request in USDC with no account at all.
Because the per-request price is almost always sub-cent, vendors quote per 1,000 to stay legible: You.com lists its Search API at $5.00 per 1,000 calls, Browserbase meters Search overage at $7 per 1,000 requests, Bright Data prices Web Unlocker and SERP requests from $1.50 down to $1.00 per 1,000 on committed tiers, and Cohere bills Rerank at $2 per 1,000 queries.
The recurring tension is that requests vary wildly in cost to serve. A cached lookup and an exhaustive research run are both “one request,” so nearly every mature per-request rate card layers something on top — effort modes, feature multipliers, request-size bands, or success-only metering. How those layers work is the real story of this unit.
How it works
The base formula is bill = requests × rate. The design work is in the levers vendors stack on it:
| Lever | What it does | Example from the corpus |
|---|---|---|
| Per-1k quoting | Makes sub-cent rates readable | You.com Search $5.00/1k calls; Twelve Labs search $4/1k queries |
| Effort / mode bands | Prices the depth of work behind one request | You.com Research $12 → $450/1k by effort; Linkup deep search 10x standard |
| Feature multipliers | One meter, scaled by request weight | ZenRows 5x JS rendering, 10x premium proxies, 25x both; ScraperAPI 1x–75x credits |
| Success-only metering | Bills delivered results, not attempts | SerpApi excludes blocked/CAPTCHA’d searches; Oxylabs doesn’t bill 5xx/6xx |
| Request quotas (no rate) | Tiers gated by monthly request bands | Mem0 10K → 500K add requests; Helicone 10K free requests/mo |
| Volume / commit discounts | Rate falls with committed spend | Bright Data $1.50 → $1.00/1k; SerpApi $7.50 → $2.75/1k reserved |
Worked example — agent search budget. An agent product on Linkup runs 100,000 standard searches a month at $0.005 each: $500. Upgrade every call to sourced-answer output at $0.006 and the same volume costs $600. Route 5% of traffic (5,000 calls) to deep search at $0.05 and that slice alone adds $250 — half the original budget for one-twentieth of the volume. The formula never changed; the per-request rate did, by mode.
Worked example — multiplier math. On ZenRows, the Universal Scraper API meters cost per 1,000 successful requests with multipliers: 5x for JavaScript rendering, 10x for premium proxies, 25x for both. A 50,000-request job against a protected, JS-heavy target therefore consumes the balance of a 1,250,000-plain-request job. Failed and retried calls don’t draw down the balance at all — and HTTP 404/410 responses count as successful completions, a detail worth reading twice before pointing a crawler at dead links.
Worked example — quota ladder instead of a rate. Mem0 never publishes a per-request price. Instead, each tier carries two request quotas: add requests scale 10,000 → 50,000 → 200,000 → 500,000 and retrieval requests 1,000 → 5,000 → 20,000 → 50,000 across Hobby (free), Starter ($19), Growth ($79), and Pro ($249). The implied unit price falls as you climb — Pro works out to roughly $0.0005 per add request — but the lever buyers actually pull is the tier, not the meter. Helicone (10,000 free requests/month, then usage-based overage) and Portkey ($49/month for 100,000 logged requests, then $9 per additional 100K) run the same quota-first pattern. For how these meters get counted in the first place, see the tracking and metering usage events guide.
Companies using this
37 in-corpus companies meter requests, making it one of the most widely shared billing units in the corpus. The cluster spans five categories — search APIs (SerpApi, Tavily, Linkup, You.com), web data and scraping (Bright Data, Oxylabs, ZenRows, ScraperAPI), browser and agent infrastructure (Browserbase, MultiOn), LLM observability and memory (Helicone, Portkey, PromptLayer, Mem0), and inference platforms where requests ride alongside tokens (Cohere, Baseten, Groq, Perplexity).
Patterns observed
-
Per-1k quoting is the house style. You.com ($5.00/1k Search calls), Browserbase ($7/1k Search, $1/1k Fetch, $4–$7/1k Extract), Bright Data ($1.50/1k Web Unlocker), Cohere ($2/1k Rerank queries), and Twelve Labs ($4/1k search queries) all normalize sub-cent request prices into per-1,000 figures. The meter is per-request; only the display unit is scaled.
-
Effort bands absorb cost variance without abandoning the unit. You.com prices Research requests at $12 / $50 / $100 / $450 per 1,000 by effort tier, Linkup charges 10x for deep search over standard, and SerpApi prices enterprise overage by speed mode ($7.50 / $15 / $30 per 1,000 on-demand). The request stays the unit; the band prices what’s behind it.
-
Multiplier systems are the scraping category’s answer to the same problem. ZenRows (5x/10x/25x by rendering and proxy class) and ScraperAPI (1 credit standard, 10 with JS rendering or premium proxies, 25 premium+render, 75 ultra-premium+render) charge heavy requests more while keeping one balance. Qodo applies the identical idea to model choice: most LLM requests cost 1 credit, Claude Opus costs 5.
-
Success-only metering is becoming table stakes where failure is common. SerpApi bills only fully successful searches, Oxylabs doesn’t charge for 5xx/6xx scraper attempts, and ZenRows draws balance only on successful results. In scraping, where block rates are a fact of life, charging per delivered result rather than per attempt is now a competitive requirement.
-
Quota gates outnumber published rates. Many vendors use requests as tier boundaries rather than priced units: Mem0’s add/retrieval quotas, Helicone’s 10K free requests, Portkey’s 100K-log allotment, OpenRouter’s 1M free BYOK requests per month (then a 5% fee), and GitHub Copilot’s premium-request allowances that became AI Credits at $0.01 each. The request is the meter even when no per-request price appears on the page.
-
Read and write requests are splitting into separate meters. Mem0 prices add requests and retrieval requests on independent quota ladders, Pinecone separates read units from write units entirely, and turbopuffer prices writes per GB and queries per data scanned. Once a platform’s read and write costs diverge, a single undifferentiated request meter stops working.
Counterexamples & variants
The cautionary tale is MultiOn. Its Agent API defined one request as roughly one action taken on a webpage — an honest unit, but one whose unit economics never stabilized: the price halved from $0.08 to $0.04 within about five weeks of public beta, a Premium tier needed a $50,000/year minimum to make the lower $0.025 rate viable, and an extraction-only Retrieve endpoint had to be split out at $0.01 because lumping it with full agent actions overpriced it 4x. MultiOn pivoted to consumer in December 2024 and the API wound down — a reminder that per-request pricing only works when the vendor actually knows what a request costs to serve.
The second counterexample is the databases that outgrew the unit. Pinecone nominally lists requests as a meter, but its real units are read units and write units: a query consumes roughly 1 RU per GB of namespace size (minimum 0.25 RU), and a write consumes 1 WU per KB (minimum 5 WU per request). Two identical API calls against differently-sized indexes cost wildly different amounts — the “request” survives only as a minimum-charge floor. turbopuffer made the same move explicit by pricing queries per unit of data scanned, then cutting that base rate from $5 to $1 per petabyte in February 2026. When request cost scales with state size rather than call count, vendors migrate to request-size pricing and the flat per-request model quietly dies.
The third group is the inference platforms — OpenAI, Groq, Fireworks AI, Google, Baseten, DeepInfra, Replicate — which record requests but run their economics on tokens or compute seconds; the request count is incidental, and those vendors belong to the token-based pricing story. The interesting hybrid is Perplexity AI’s Sonar API, which charges per-token rates plus a per-1,000-request search fee ($5–$12 depending on search-context depth) — the request fee prices the retrieval work that tokens can’t see. And Linkup’s x402 mode is the unit taken to its logical extreme: a flat $0.01 per request paid in USDC on Base, no account, no invoice — per-request pricing as a wire protocol.
What this means for buyers vs vendors
For buyers
Start by finding out what one request actually costs for your workload, not the headline rate. The published number is usually the cheapest request on the card: on ZenRows a protected JS-heavy target costs 25x the plain rate, on You.com an exhaustive Research call costs 37x a lite one, and on Pinecone the same query gets more expensive as your index grows. Ask three questions in procurement: which request types and multipliers will my traffic actually hit; are failed requests billed (on SerpApi and Oxylabs they aren’t — on most APIs they are); and is there a budget cap or alert, like Upstash’s hard monthly ceiling with alerts at 70% and 90%. Then model your real mix with the pricing calculator before committing to a tier — quota-gated plans like Mem0’s punish misestimating which of the two request meters (reads or writes) you’ll exhaust first.
For vendors
Per-request pricing earns you the fastest quote-to-forecast path in usage-based pricing — a developer can budget in seconds — but only if your cost to serve a request is roughly uniform. The corpus shows three proven ways to keep the unit when it isn’t: effort bands (You.com, Linkup), feature multipliers on a single balance (ZenRows, ScraperAPI), and read/write meter splits (Mem0, Pinecone). If you’re in a category with real failure rates, bill successful requests only — SerpApi and Oxylabs have made that the trust baseline in web data, and charging for blocked attempts now reads as a red flag. Price the unit before you launch it, not after: MultiOn repricing twice in five weeks is what discovering your serving cost in production looks like. The choosing the right usage metric guide covers when the request is the right value metric at all, and tracking and metering usage events covers the pipeline you’ll need to count it defensibly.
| Company | Product | Pricing model | Billing units | Free tier | Verified |
|---|---|---|---|---|---|
| Baseten | ML inference infrastructure — dedicated GPU deployments, Model APIs, and Truss framework | Yes | 2026-05-29 | ||
| Bright Data | Web data platform — proxy networks, scraping APIs, a managed scraping browser, SERP and unlocker APIs, ready-made datasets, and eCommerce insights | Yes | 2026-06-04 | ||
| Browserbase | Browser-agent infrastructure: headless browser sessions, web Search/Fetch APIs, agent identity, runtime, and a model gateway behind one API key | Yes | 2026-06-02 | ||
| Cartesia | Real-time voice AI platform (Sonic TTS, voice cloning, voice agents) | Yes | 2026-05-29 | ||
| Clipdrop | AI image-editing and generation tools (background removal, upscaling, text-to-image), now part of Jasper | Yes | 2026-06-05 | ||
| Cohere | Command, Embed, Rerank APIs | Yes | 2026-05-29 | ||
| DeepInfra | Serverless inference cloud — per-token LLM/embedding APIs, per-image and per-minute media models, per-hour on-demand GPU containers, and reserved DeepCluster GPU clusters | No | 2026-06-02 | ||
| Exa | AI web search API for agents — search, contents, deep research, and monitoring endpoints billed per request | Yes | 2026-06-01 | ||
| Fal | Generative-media inference platform — serverless per-output model APIs plus dedicated GPU compute | No | 2026-06-01 | ||
| Fireworks AI | Generative AI inference platform — serverless per-token, on-demand GPU, fine-tuning, batch API | Yes | 2026-05-30 | ||
| GitHub Copilot | AI pair programmer and coding agent embedded in GitHub, VS Code, and most major IDEs. | Yes | 2026-06-02 | ||
| Gladia | Speech-to-text & audio intelligence API | Yes | 2026-06-09 | ||
| Gemini API & AI Studio | Yes | 2026-05-29 | |||
| Groq | GroqCloud — LPU-based ultra-low-latency inference API for Llama, GPT-OSS, Qwen, Whisper, and Mixtral | Yes | 2026-05-29 | ||
| Helicone | Open-source LLM observability & AI gateway | Yes | 2026-06-09 | ||
| Jina AI | Search Foundation API (Embeddings, Reranker, Reader, DeepSearch, Classifier) | Yes | 2026-06-03 | ||
| Linkup | Web search API for AI agents — Search, Fetch, and async Research endpoints with grounded, structured results | Yes | 2026-06-04 | ||
| Mem0 | Memory layer for AI agents and applications | Yes | 2026-06-10 | ||
| MultiOn | Autonomous web-browsing AI agent API (wound down) | No | 2026-06-10 | ||
| OpenAI | ChatGPT consumer subscriptions + GPT-5.x API with token-based usage billing | Yes | 2026-05-30 | ||
| OpenRouter | Multi-model LLM API routing marketplace | Yes | 2026-06-10 | ||
| Oxylabs | Web data collection: residential, datacenter, ISP & mobile proxies plus Web Scraper API and Web Unblocker | Yes | 2026-06-04 | ||
| Perplexity AI | AI-native answer engine with citations and multi-model search | Yes | 2026-05-29 | ||
| Phind | AI developer search engine and coding assistant (shut down January 2026) | Yes | 2026-06-08 | ||
| Pinecone | Managed vector database (serverless) | Yes | 2026-06-09 | ||
| Portkey | AI gateway & LLMOps governance platform | Yes | 2026-06-10 | ||
| PromptLayer | Prompt management, evaluation, and observability platform for LLM and AI-agent teams | Yes | 2026-06-04 | ||
| Qodo | Qodo (formerly Codium AI) — AI code integrity platform: Qodo Gen (IDE plugin), Qodo Merge (PR review agent), and Qodo Command (CLI / agentic quality workflows) | Yes | 2026-06-03 | ||
| Reka AI | Natively multimodal models (Spark, Edge, Flash, Core) + Research & Vision APIs | Yes | 2026-06-11 | ||
| Replicate | Cloud platform for running, fine-tuning, and deploying AI models via REST API | Yes | 2026-05-30 | ||
| ScraperAPI | Web scraping API that handles proxies, browsers, and CAPTCHAs behind a single endpoint | No | 2026-06-04 | ||
| SerpApi | Real-time search-results API (Google, Bing, and other engines) | Yes | 2026-06-04 | ||
| Tavily | Tavily Search API | Yes | 2026-06-03 | ||
| turbopuffer | Serverless vector and full-text search database on object storage | No | 2026-06-04 | ||
| Twelve Labs | Video understanding foundation models (Marengo for search/embeddings, Pegasus for analysis) delivered as a usage-metered API | Yes | 2026-06-02 | ||
| Upstash | Upstash (Redis, Vector, QStash, Search, Workflow) | Yes | 2026-06-03 | ||
| Vectara | Enterprise RAG-as-a-Service and agent platform for trusted, grounded, auditable AI | No | 2026-06-02 | ||
| Writesonic | GEO / AI-search-visibility and SEO platform that tracks brand mentions across AI answer engines and ships content/citation fixes | Yes | 2026-06-07 | ||
| You.com | Web search, contents, research, and finance-research APIs for AI systems | Yes | 2026-06-01 | ||
| ZenRows | Universal Scraper API, Scraping Browser, and Residential Proxies | Yes | 2026-06-04 |
FAQ
What is per-request pricing?
Per-request pricing is a billing unit where the customer is charged for each request served by the platform — an inference call, a search, a scrape, or a browser action. Rates are usually quoted per 1,000 requests because the per-request price is often a fraction of a cent.
How is per-request pricing different from token-based pricing?
Token pricing meters the volume of text processed inside a request, so two calls can cost very different amounts. Per-request pricing charges per call regardless of payload, which is easier to forecast but a looser fit to the vendor's serving cost — which is why many vendors add request-size bands or effort modes on top.
Why do vendors quote per 1,000 requests instead of per request?
Because single-request prices are usually sub-cent. Quoting '$7 per 1,000 requests' (Browserbase Search) or '$2 per 1,000 queries' (Cohere Rerank) keeps the rate card readable while the meter underneath stays strictly per-request.
Which companies use per-request pricing?
It spans search APIs (Linkup, Tavily, You.com, SerpApi), scraping and web data (Bright Data, Oxylabs, ZenRows, ScraperAPI), browser and agent infrastructure (Browserbase, MultiOn), AI memory and observability (Mem0, Helicone, Portkey, PromptLayer), and inference platforms (Cohere, Perplexity, Baseten). 37 in-corpus companies list requests as a billing unit.
Do vendors charge for failed requests?
Increasingly not, in categories where failure is common. SerpApi excludes blocked, errored, and CAPTCHA'd searches; ZenRows bills only successful results; Oxylabs' Web Scraper API doesn't charge for 5xx/6xx attempts. Most inference and search APIs, by contrast, bill every request that reaches the endpoint.
What are request multipliers?
A multiplier charges more request-units for heavier work while keeping one meter. ZenRows charges 5x for JavaScript rendering, 10x for premium proxies, and 25x for both; ScraperAPI's credits run 1x to 75x per request by feature; Qodo charges 1 credit for most LLM requests but 5 for Claude Opus.
Trivia
-
MultiOn — the purest per-request business in the corpus — halved its Agent API price from $0.08 to $0.04 per request within about five weeks of public beta in 2024, then wound the API down entirely after pivoting to consumer in December 2024.
-
You.com's Research API prices the same nominal "request" at $12 (lite), $50 (standard), $100 (deep), or $450 (exhaustive) per 1,000 calls — and a contact-sales Frontier tier listed above $2,000 per 1,000, a 160x+ spread on one endpoint.
-
Mem0 renamed its billing meters from "memories" to "add requests" and "retrieval requests" in 2026 — a read/write split where every tier carries two separate request quotas (e.g. Pro: 500,000 adds but only 50,000 retrievals per month).
Related billing units
- Credit-Based BillingA billing unit where customers pre-purchase or are allocated a pool of credits that deplete as they use the product, often at variable rates per feature.
- Token-Based PricingA billing unit common in LLM and AI products, where customers are charged per input and output token processed.
- Per-Seat PricingA billing unit where the vendor charges a fixed fee per named user, regardless of how much each user consumes.
- Per-Resolution PricingA billing unit unique to AI customer-support products, where the vendor charges only when an AI agent resolves a customer issue without escalation.
- Bandwidth-Based PricingA billing unit where customers are charged per gigabyte of data transferred out of the platform.
- Per-Function-Invocation PricingA billing unit where customers are charged per serverless function invocation, often combined with a separate compute-time charge.
- CPU-Hour PricingA billing unit where customers are charged for the CPU time their workloads consume, typically measured in vCPU-seconds or vCPU-hours.
- GB-Hour PricingA billing unit where customers are charged for the memory their workloads consume over time, measured in gigabyte-hours.
- GPU-Hour PricingA billing unit where customers are charged for GPU time consumed, typically measured per-second or per-hour by GPU type.
- Per-API-Call PricingA billing unit where customers are charged per API request, regardless of payload size or processing time.
- Per-GB Storage PricingA billing unit where customers are charged per gigabyte of data stored on the platform per month.
- Media-Minute PricingA billing unit where customers are charged per minute of audio or video processed — used by speech, voice, and video AI vendors.
- Per-Event PricingA billing unit where customers are charged per event ingested — the native meter of observability and billing-infrastructure platforms.
- Vector Storage PricingA billing unit where customers are charged for vectors stored or indexed — the storage dimension of vector database pricing.
- Per-Character PricingA billing unit where customers are charged per character of text processed — the standard meter for text-to-speech and translation.
- Per-Document PricingA billing unit where customers are charged per document processed or generated — common in AI writing, SEO, and document-intelligence tools.
- Per-Page PricingA billing unit where customers are charged per page crawled, parsed, or rendered — the meter for web scraping and document parsing.
- Per-Transaction PricingA billing unit where customers are charged per financial or billing transaction processed — the meter of billing and accounting platforms.
- Active-User PricingA billing unit where customers are charged per monthly or daily active user rather than per provisioned seat.
- Per-Task PricingA billing unit where customers are charged per task an automation or agent executes — Zapier's historical unit, now spreading to AI agents.