LLM Cost Calculator

How to Use the LLM Cost Calculator

Pick the model

Choose the model you plan to call — GPT-5, Claude, Gemini or Mistral. Each carries its own input and output rate, shown live beneath the dropdown.

Enter your token counts

Put in the input and output tokens for a typical request. Not sure of the input size? Measure it first with our Token Counter, then paste the number here.

Set your monthly volume

Enter how many requests you expect per month. The calculator multiplies the per-request cost by your volume to project a monthly bill.

Compare and budget

Switch models to compare costs instantly. Output tokens are usually priced higher than input, so a chatty model can cost far more than its headline input rate suggests.

How LLM API Pricing Actually Works

Input and output tokens are priced separately

Every major LLM API charges per token, and almost always at two different rates: one for the input (the prompt and context you send) and a higher one for the output (the text the model generates). That split matters enormously. A summarisation task sends a huge input and returns a tiny output, so it's dominated by the input rate; a creative-writing or agentic task may send a short prompt but generate thousands of output tokens, where the output rate dominates. Estimating cost from a single "price per token" headline will mislead you — you have to model both sides, which is exactly what this calculator does.

Volume is the other half of the equation. A request that costs a fraction of a cent looks free until you multiply it by hundreds of thousands of calls a month. Multiplying per-request cost by realistic monthly volume is how a tiny per-call figure turns into a real line on your budget — and how you spot, early, that a slightly cheaper model or a tighter prompt could save serious money at scale.

"The cheapest model isn't always the cheapest bill. Output-heavy workloads can flip the maths entirely."

The levers that actually move your bill

Once you can see the numbers, the optimisations become obvious. Shorten the prompt — trim system messages and few-shot examples that aren't earning their tokens. Cap the output — set a max-tokens limit so a runaway generation can't blow your budget. Right-size the model — route simple tasks to a cheaper tier and reserve the flagship for work that needs it. Many providers also offer cached-input and batch discounts that this calculator deliberately leaves out, so the figures here are a conservative ceiling rather than a best case. Treat the result as a planning estimate, confirm the live rates with your provider, and you'll size your AI spend with far more confidence than a back-of-envelope guess.

10 Facts About LLM API Costs

01

LLM APIs bill per token, and input and output tokens are almost always priced differently.

02

Output tokens cost more than input tokens — often several times more on flagship models.

03

Prices are quoted per million tokens, which makes a single request look almost free until you scale up.

04

A summary task is input-heavy; a writing or agent task is output-heavy — they cost very differently.

05

Volume is the multiplier: a fraction of a cent times a million calls is a real bill.

06

Many providers offer cached-input and batch discounts that cut costs well below list price.

07

Google's Gemini 2.5 Pro uses context tiers — prompts over 200K tokens are billed at a higher rate.

08

Setting a max-tokens limit caps the output and protects you from a runaway generation.

09

Right-sizing — routing easy tasks to a cheaper model — is often the single biggest cost saver.

10

This calculator runs entirely in your browser — your numbers are never uploaded.

Frequently Asked Questions

Cost is the input tokens times the model's input rate, plus the output tokens times its (usually higher) output rate — both quoted per million tokens. Multiply that per-request cost by your monthly request volume to project a monthly bill. This calculator does exactly that.
Generating tokens is more computationally expensive than reading them — each output token requires a full forward pass through the model, while input can be processed more efficiently in parallel. Providers reflect that in pricing, so output rates are typically several times the input rate.
The arithmetic is exact, but it's only as current as the price list. Rates here were gathered on 16 June 2026 and providers change pricing without notice. The figures also exclude taxes and any batch or cached-token discounts, so treat the result as a conservative planning estimate and confirm live rates with the provider.
Measure a representative prompt with our Token Counter to get the input token count, and estimate output from the length of replies you expect. If you've already made API calls, the provider's usage dashboard reports actual input and output tokens per request.
Google prices Gemini 2.5 Pro in two tiers based on prompt size: prompts up to 200,000 input tokens use the standard rate, and prompts above that use a higher long-context rate. The calculator switches tiers automatically once your input tokens exceed 200,000.
No — it deliberately uses standard list rates. Many providers offer cheaper batch processing or discounts for cached/repeated input, which can cut real costs significantly. Because those depend on your usage pattern, this tool gives you the conservative ceiling and leaves the discounts as upside.
Shorten prompts by trimming system text and unused examples, cap output with a max-tokens limit, route simple tasks to a cheaper model and reserve the flagship for hard ones, and use batch or cached-input pricing where available. Compare models in this calculator to see each lever's effect.
Not necessarily. A model with a low input rate but a high output rate can cost more on output-heavy workloads, and a cheaper but weaker model may need longer prompts or more retries. Model your actual input/output mix and volume — that's where a calculator beats a headline price.
No. All calculation happens in your browser with plain JavaScript. Your token counts and volumes are never sent to any server or third party, and nothing is stored.
Completely free, with no account or sign-up, and no limit on use. It runs in your browser and collects no data.

Related News

You may be interested in these recent stories from our newsroom.

No related news yet for this tool. Our editorial team publishes new pieces every week.

Browse all news →