Per-Request Pricing: Examples & Companies

What is it

Per-Request Pricing is a billing unit where customers are charged per request served — the generic meter for inference endpoints, search, scraping, and browser infrastructure. Whatever the product does — answer a search, scrape a page, run a model, write a memory — the request is the countable thing that crosses the API boundary, and the bill is request volume times a published rate.

The purest living expression is Linkup — a web search API with no seats and no tiers, just $0.005 per standard search and $0.05 per deep search, plus an x402 mode that bills a flat $0.01 per request in USDC on Base with no account at all. MultiOn once priced one web action as one request just as cleanly, before its unit economics unraveled (a cautionary tale below).

Because the per-request price is almost always sub-cent, vendors quote per 1,000 to stay legible: Exa lists its Search API at $7 per 1,000 requests (about $0.007 a call), You.com meters Search at $5.00 per 1,000 calls, Browserbase prices Search overage at $7 per 1,000 requests, and Cohere bills its Rerank API at $2 per 1,000 queries. The headline unit shrinks to a readable figure while the meter underneath stays strictly one-per-call.

The recurring tension is that requests vary wildly in cost to serve. A cached lookup and an exhaustive research run are both “one request,” so nearly every mature per-request rate card layers something on top — effort modes, feature multipliers, request-size bands, or success-only metering. How those layers work is the real story of this unit, and it’s the difference between a $7 headline and a bill several times larger.

One request, one balance · ScraperAPI credit draw

How it works

The base formula is bill = requests × rate. The design work is in the levers vendors stack on it to keep one meter honest when requests aren’t uniform in cost:

Lever	What it does	Example from the corpus
Per-1k quoting	Makes sub-cent rates readable	Exa Search $7/1k requests; Twelve Labs search $4/1k queries
Effort / mode bands	Prices the depth of work behind one request	You.com Research $12 → $450/1k by effort; Linkup deep search 10x standard
Feature multipliers	One meter, scaled by request weight	ZenRows 5x JS rendering, 10x premium proxies, 25x both; ScraperAPI 1x–75x credits
Success-only metering	Bills delivered results, not attempts	SerpApi excludes blocked/CAPTCHA’d searches; Oxylabs doesn’t bill 5xx/6xx
Request quotas (no rate)	Tiers gated by monthly request bands	Mem0 10K → 500K add requests; Helicone 10K free requests/mo
Volume / commit discounts	Rate falls with committed spend	Bright Data $1.50 → $1.00/1k; SerpApi $7.50 → $2.75/1k reserved

Worked example — agent search budget. An agent product on Linkup runs 100,000 standard searches a month at $0.005 each: $500. Upgrade every call to sourced-answer output at $0.006 and the same volume costs $600. Route 5% of traffic (5,000 calls) to deep search at $0.05 and that slice alone adds $250 — half the original budget for one-twentieth of the volume. The formula never changed; the per-request rate did, by mode.

Worked example — multiplier math. On ZenRows, the Universal Scraper API meters per 1,000 successful requests with multipliers of 5x for JavaScript rendering, 10x for premium proxies, and 25x for both. A 50,000-request job against a protected, JS-heavy target therefore consumes the balance of a 1,250,000-plain-request job. Failed and retried calls don’t draw down the balance at all — but HTTP 404/410 responses count as successful completions, a detail worth reading twice before pointing a crawler at dead links.

Worked example — quota ladder instead of a rate. Mem0 never publishes a per-request price. Instead, each tier carries two request quotas: add requests scale 10,000 → 50,000 → 200,000 → 500,000 and retrieval requests 1,000 → 5,000 → 20,000 → 50,000 across Hobby (free), Starter ($19), Growth ($79), and Pro ($249). The lever buyers actually pull is the tier, not the meter. Helicone (10,000 free requests/month on the Hobby tier, then usage-based overage) and Portkey (Production at $49/month for 100,000 logged requests, then $9 per additional 100K up to 3M) run the same quota-first pattern. For how these meters get counted in the first place, see the tracking and metering usage events guide.

Companies using this

42 in-corpus companies meter requests, making it one of the most widely shared billing units in the corpus. The cluster spans five categories — search APIs (SerpApi, Tavily, Linkup, You.com, Exa), web data and scraping (Bright Data, Oxylabs, ZenRows, ScraperAPI), browser and agent infrastructure (Browserbase, MultiOn), LLM observability and memory (Helicone, Portkey, PromptLayer, Mem0), and inference or serving platforms where requests ride alongside tokens (Cohere, Baseten, Groq, Fireworks AI, DeepInfra, Replicate, Perplexity).

Patterns observed

Per-1k quoting is the house style. Exa ($7/1k Search requests), You.com ($5.00/1k Search calls), Browserbase ($7/1k Search, $1/1k Fetch, $4–$7/1k Extract), Bright Data ($1.50/1k Web Unlocker results), Cohere ($2/1k Rerank queries), and Twelve Labs ($4/1k search queries) all normalize sub-cent request prices into per-1,000 figures. The meter is per-request; only the display unit is scaled.
Effort bands absorb cost variance without abandoning the unit. You.com prices Research requests at $12 / $50 / $100 / $450 per 1,000 by effort tier (lite → exhaustive), Linkup charges 10x for deep search over standard, Exa splits Search ($7/1k) from Deep Search ($12–15/1k) and an Agent endpoint that runs $0.025 to $2.00 per run by fixed-effort mode, and SerpApi prices enterprise overage by speed mode ($7.50 / $15 / $30 per 1,000 on-demand). The request stays the unit; the band prices what’s behind it.
Multiplier systems are the scraping category’s answer to the same problem. ZenRows and ScraperAPI both charge heavy requests more while keeping one balance, scaling by rendering, proxy class, and target difficulty. Qodo applies the identical idea to model choice: most LLM requests cost 1 credit, Grok 4 costs 4, and Claude Opus costs 5 — the request stays the unit while the model behind it sets the weight.
Success-only metering is becoming table stakes where failure is common. SerpApi bills only fully successful searches — blocked, errored, and CAPTCHA’d responses are free — Oxylabs doesn’t charge for 5xx/6xx scraper attempts, ZenRows draws balance only on successful results, and Linkup deducts no credit when a request errors. In scraping and search, where block rates are a fact of life, charging per delivered result rather than per attempt is now a competitive requirement.
Quota gates outnumber published rates. Many vendors use requests as tier boundaries rather than priced units: Helicone’s 10K free requests, Portkey’s 100K-log allotment, OpenRouter’s 1M free BYOK requests per month (then a 5% fee), and GitHub Copilot’s premium-request allowances that became AI Credits at $0.01 each. The request is the meter even when no per-request price appears on the page.
Read and write requests are splitting into separate meters. Mem0 prices add requests and retrieval requests on independent quota ladders, and Pinecone separates read units from write units entirely. Once a platform’s read and write costs diverge, a single undifferentiated request meter stops working — the same pressure that pushes vendors from a flat request price toward request-size pricing.

Counterexamples & variants

The cautionary tale is MultiOn. Its Agent API defined one request as roughly one action taken on a webpage — an honest unit, but one whose unit economics never stabilized: the price halved from $0.08 to $0.04 within about five weeks of public beta, a Premium tier needed a $50,000/year minimum to make the lower $0.025 rate viable, and an extraction-only Retrieve endpoint had to be split out at $0.01 because lumping it with full agent actions overpriced it. MultiOn pivoted to consumer in December 2024 (rebranded Please) and the API wound down — a reminder that per-request pricing only works when the vendor actually knows what a request costs to serve.

The second counterexample is the databases that outgrew the unit. Pinecone nominally lists requests as a meter, but its real units are read units and write units: a query consumes roughly 1 RU per GB of namespace size (minimum 0.25 RU), and a write consumes 1 WU per KB (minimum 5 WU per request). Two identical API calls against differently-sized indexes cost wildly different amounts — the “request” survives only as a minimum-charge floor. turbopuffer made the same move explicit by pricing queries per unit of data scanned, then cutting that base rate from $5 to $1 per petabyte in February 2026 — an up-to-94% reduction for large namespaces. When request cost scales with state size rather than call count, vendors migrate to request-size pricing and the flat per-request model quietly dies.

The third group is the inference platforms — OpenAI, Groq, Fireworks AI, Google Gemini, Baseten, DeepInfra, Replicate — which record requests but run their economics on tokens or compute seconds; the request count is incidental, and those vendors belong to the token-based pricing story. The interesting hybrid is Perplexity’s Sonar API, which charges per-token rates plus a per-request search fee ($5–$14 per 1,000 requests depending on search-context depth, e.g. $10 per 1,000 medium-context on Sonar Pro) — the request fee prices the retrieval work that tokens can’t see, the mirror image of a database charging by state size.

What this means for buyers vs vendors

For buyers

Start by finding out what one request actually costs for your workload, not the headline rate. The published number is usually the cheapest request on the card: on ZenRows a protected JS-heavy target costs 25x the plain rate, on You.com an exhaustive Research call costs 37x a lite one, on Exa returning more than 10 results adds a per-additional-result surcharge, and on Pinecone the same query gets more expensive as your index grows.

Ask three questions in procurement. First, which request types and multipliers will my traffic actually hit — the effort mode, the render flag, the proxy class? Second, are failed requests billed? On SerpApi, Oxylabs, and Linkup they aren’t; on most inference and search APIs they are, and retries at scale can quietly double a bill. Third, is there a budget cap or alert — like Upstash’s hard monthly ceiling that rate-limits pay-as-you-go databases with email alerts at 70% and 90% of budget — or does the meter run unbounded until you notice?

Then model your real request mix before committing to a tier. Quota-gated plans like Mem0’s punish misestimating which of the two request meters — reads or writes — you’ll exhaust first, and a plan that’s generous on adds can strangle you on retrievals. Our pricing calculator hub lets you sketch a monthly bill from request volume, effort mix, and included quotas before you sign, and the choosing the right usage metric guide covers when a raw request count is even the right thing to be paying for.

For vendors

Per-request pricing earns you the fastest quote-to-forecast path in usage-based pricing — a developer can budget in seconds and a request count is trivial to explain — but only if your cost to serve a request is roughly uniform. When it isn’t, the corpus shows three proven ways to keep the unit rather than abandon it: effort bands (You.com, Linkup, Exa), feature multipliers on a single balance (ZenRows, ScraperAPI), and read/write meter splits (Mem0, Pinecone).

If you’re in a category with real failure rates, bill successful requests only. Charge-on-success is now the trust baseline in web data, and billing for blocked attempts reads as a red flag to any buyer who has compared rate cards. In search and agent APIs the same logic holds — agents retry constantly, and surprise charges for their failures erode confidence fast.

Above all, price the unit before you launch it, not after: MultiOn’s scramble to reprice in production preceded its wind-down, and it is what discovering your serving cost too late looks like. The tracking and metering usage events guide covers the pipeline you’ll need to count requests defensibly, and the pricing calculator hub is a fast way to pressure-test whether your proposed rate survives a realistic request mix before it hits a customer’s invoice.

Company	Product	Pricing model	Billing units	Free tier	Verified
Baseten	ML inference infrastructure — dedicated GPU deployments, Model APIs, and Truss framework	pure-usage hybrid commitment	gpu-hours tokens requests	Yes	2026-07-21
Bright Data	Web data platform — proxy networks, scraping APIs, a managed scraping browser, SERP and unlocker APIs, ready-made datasets, and eCommerce insights	pure-usage hybrid commitment	bandwidth-gb requests records	Yes	2026-07-23
Browserbase	Browser-agent infrastructure: headless browser sessions, web Search/Fetch APIs, agent identity, runtime, and a model gateway behind one API key	freemium hybrid pure-usage	browser-hours api-calls requests	Yes	2026-06-02
Cartesia	Real-time voice AI platform (Sonic TTS, voice cloning, voice agents)	freemium subscription hybrid	credits media-minutes requests	Yes	2026-07-22
Clipdrop	AI image-editing and generation tools (background removal, upscaling, text-to-image), now part of Jasper	freemium subscription	requests credits api-calls	Yes	2026-06-05
Cohere	Command, Embed, Rerank APIs	pure-usage	tokens api-calls requests	Yes	2026-05-29
DeepInfra	Serverless inference cloud — per-token LLM/embedding APIs, per-image and per-minute media models, per-hour on-demand GPU containers, and reserved DeepCluster GPU clusters	pure-usage commitment	tokens gpu-hours requests	No	2026-07-21
Exa	AI web search API for agents — search, contents, deep research, and monitoring endpoints billed per request	pure-usage freemium	requests credits api-calls	Yes	2026-07-14
Fal	Generative-media inference platform — serverless per-output model APIs plus dedicated GPU compute	pure-usage	gpu-hours requests media-minutes	No	2026-07-23
Fireworks AI	Generative AI inference platform — serverless per-token, on-demand GPU, fine-tuning, batch API	pure-usage hybrid commitment	tokens gpu-hours requests	Yes	2026-07-22
Frase	Agentic SEO and GEO platform that researches, writes, optimizes, and tracks AI-search visibility for content teams.	subscription seat-based	seats documents pages-rendered	No	2026-06-24
GitHub Copilot	AI pair programmer and coding agent embedded in GitHub, VS Code, and most major IDEs.	hybrid seat-plus-usage freemium	seats credits requests	Yes	2026-07-22
Gladia	Speech-to-text & audio intelligence API	pure-usage freemium commitment	media-minutes requests credits	Yes	2026-07-22
Google	Gemini API & AI Studio	pure-usage freemium	tokens requests api-calls	Yes	2026-07-14
Groq	GroqCloud — LPU-based ultra-low-latency inference API for Llama, GPT-OSS, Qwen, Whisper transcription, and Orpheus text-to-speech	pure-usage hybrid commitment	tokens requests api-calls	Yes	2026-07-21
Helicone	Open-source LLM observability & AI gateway	hybrid freemium	requests logs storage-gb	Yes	2026-06-09
Jina AI	Search Foundation API (Embeddings, Reranker, Reader, DeepSearch, Classifier)	pure-usage freemium	tokens requests api-calls	Yes	2026-06-03
Linkup	Web search API for AI agents — Search, Fetch, and async Research endpoints with grounded, structured results	pure-usage freemium	requests credits api-calls	Yes	2026-07-14
Mem0	Memory layer for AI agents and applications	subscription freemium	requests	Yes	2026-07-21
MiniMax	Foundation models, Hailuo video & per-token API	pure-usage freemium	tokens seats credits	Yes	2026-07-23
MultiOn	Autonomous web-browsing AI agent API (wound down)	pure-usage commitment	requests	No	2026-07-21
Netlify	Web development & deployment platform (Agent Runners / AI)	freemium hybrid pure-usage	credits builds gb-hours	Yes	2026-07-14
OpenAI	ChatGPT consumer subscriptions + GPT-5.6 API with token-based usage billing	freemium subscription seat-based	tokens seats api-calls	Yes	2026-07-23
OpenRouter	Multi-model LLM API routing marketplace	pure-usage freemium	tokens credits requests	Yes	2026-07-14
Oxylabs	Web data collection: residential, datacenter, ISP & mobile proxies plus Web Scraper API and Web Unblocker	hybrid pure-usage freemium	bandwidth-gb ips records	Yes	2026-07-06
Perplexity AI	AI-native answer engine with citations and multi-model search	freemium subscription seat-based	seats tokens requests	Yes	2026-07-21
Phind	AI developer search engine and coding assistant (shut down January 2026)	freemium subscription seat-based	seats requests active-users	Yes	2026-06-08
Pinecone	Managed vector database (serverless)	pure-usage hybrid	requests storage-gb vectors-indexed	Yes	2026-07-23
Portkey	AI gateway & LLMOps governance platform	hybrid freemium	requests logs	Yes	2026-06-10
PromptLayer	Prompt management, evaluation, and observability platform for LLM and AI-agent teams	freemium hybrid	seats requests transactions	Yes	2026-07-22
Qodo	Qodo (formerly Codium AI) — AI code integrity platform: Qodo Gen (IDE plugin), Qodo Merge (PR review agent), and Qodo Command (CLI / agentic quality workflows)	pure-usage hybrid	credits requests	No	2026-07-23
Reka AI	Natively multimodal models (Edge, Flash, Core) + Research & Vision APIs	pure-usage freemium	tokens api-calls requests	Yes	2026-07-22
Replicate	Cloud platform for running, fine-tuning, and deploying AI models via REST API	pure-usage hybrid commitment	gpu-hours tokens requests	Yes	2026-05-30
RunPod	GPU cloud marketplace — Secure Cloud and Community Cloud Pods, Serverless endpoints, and persistent storage	pure-usage hybrid commitment	gpu-hours storage-gb requests	No	2026-07-22
ScraperAPI	Web scraping API that handles proxies, browsers, and CAPTCHAs behind a single endpoint	subscription pure-usage	credits requests api-calls	Yes	2026-07-22
SerpApi	Real-time search-results API (Google, Bing, and other engines)	subscription pure-usage	api-calls requests	Yes	2026-06-04
Sweep AI	AI coding assistant for JetBrains IDEs	freemium subscription seat-plus-usage	seats credits requests	Yes	2026-06-16
Tavily	Tavily Search API	pure-usage freemium	credits api-calls requests	Yes	2026-07-22
turbopuffer	Serverless vector and full-text search database on object storage	pure-usage commitment	storage-gb vectors-indexed gb-hours	No	2026-07-22
Twelve Labs	Video understanding foundation models (Marengo for search/embeddings, Pegasus for analysis) delivered as a usage-metered API	pure-usage freemium commitment	media-minutes tokens requests	Yes	2026-06-02
Upstash	Upstash (Redis, Vector, QStash, Workflow, Search, Box)	pure-usage freemium hybrid	requests api-calls vectors-indexed	Yes	2026-07-22
Vectara	Enterprise RAG-as-a-Service and agent platform for trusted, grounded, auditable AI	commitment subscription	credits requests storage-gb	No	2026-06-02
Writesonic	GEO / AI-search-visibility and SEO platform that tracks brand mentions across AI answer engines and ships content/citation fixes	subscription freemium	seats requests actions	Yes	2026-06-07
You.com	Web search, contents, research, and finance-research APIs for AI systems	pure-usage freemium	api-calls requests pages-rendered	Yes	2026-07-22
ZenRows	Universal Scraper API, Scraping Browser, and Residential Proxies	hybrid subscription pure-usage	requests api-calls bandwidth-gb	Yes	2026-06-04

Explore this theme in the knowledge graph

FAQ

What is per-request pricing?

Per-request pricing is a billing unit where the customer is charged for each request served by the platform — an inference call, a search, a scrape, or a browser action. Rates are usually quoted per 1,000 requests because the per-request price is often a fraction of a cent.

How is per-request pricing different from token-based pricing?

Token pricing meters the volume of text processed inside a request, so two calls can cost very different amounts. Per-request pricing charges per call regardless of payload, which is easier to forecast but a looser fit to the vendor's serving cost — which is why many vendors add request-size bands or effort modes on top.

Why do vendors quote per 1,000 requests instead of per request?

Because single-request prices are usually sub-cent. Quoting '$7 per 1,000 requests' (Exa Search) or '$2 per 1,000 queries' (Cohere Rerank) keeps the rate card readable while the meter underneath stays strictly per-request.

Which companies use per-request pricing?

It spans search APIs (Linkup, Tavily, You.com, SerpApi, Exa), scraping and web data (Bright Data, Oxylabs, ZenRows, ScraperAPI), browser and agent infrastructure (Browserbase, MultiOn), AI memory and observability (Mem0, Helicone, Portkey, PromptLayer), and inference platforms (Cohere, Perplexity, Baseten, Groq). 42 in-corpus companies list requests as a billing unit.

Do vendors charge for failed requests?

Increasingly not, in categories where failure is common. SerpApi excludes blocked, errored, and CAPTCHA'd searches; ZenRows bills only successful results; Oxylabs' Web Scraper API doesn't charge for 5xx/6xx attempts; Linkup deducts no credit when a request errors. Most inference and search APIs, by contrast, bill every request that reaches the endpoint.

What are request multipliers?

A multiplier charges more request-units for heavier work while keeping one meter. ZenRows charges 5x for JavaScript rendering, 10x for premium proxies, and 25x for both; ScraperAPI's credits run 1x to 75x per request by feature; Qodo charges 1 credit for most LLM requests but 5 for Claude Opus.

Related billing units

Related guides & calculators

Back to companies