LLM Price Comparison

How to Use the LLM Price Comparison

Enter your token counts

Put in the input and output tokens for a typical request. Not sure? Measure the input with our Token Counter first.

Read the ranked table

Every model is priced for your exact request and sorted cheapest to priciest, so the best-value option is always at the top.

Check the "relative" column

It shows how many times more expensive each model is than the cheapest — a fast way to see whether a premium model costs 2× or 20× more for your workload.

Adjust the input/output mix

Output tokens are priced higher than input. Raise the output figure to model a chatty, generation-heavy task and watch the ranking shift.

Comparing LLM Prices the Right Way

A headline price tells you almost nothing

Provider pricing pages quote a per-million-token input rate and a separate, higher output rate — but the number that matters is what your request costs, and that depends entirely on your input/output mix. A model with a cheap input rate but an expensive output rate can be the bargain for a summarisation task and the worst choice for a creative one. This tool removes the guesswork: you give it one realistic request, and it prices that exact request across every model, then ranks them. The cheapest option for a long-context, short-answer job is often a completely different model than the cheapest for a short-prompt, long-answer job.

The relative column is where the insight usually lands. Seeing that a flagship model is "14× the cheapest" for your workload reframes the decision: is the quality difference worth fourteen times the cost at your volume? Sometimes emphatically yes; often a mid-tier model does the job for a fraction of the price. Comparing on real numbers, rather than vibes or headline rates, is how teams avoid quietly overpaying.

"Don't compare price tags — compare what your actual request costs on each. The ranking can flip the moment your output-to-input ratio changes."

What the comparison leaves out (on purpose)

These figures are standard list rates, and they deliberately exclude the discounts that vary by usage: cached-input pricing (cheap when you reuse a long shared prompt), batch processing (often around half price for non-urgent jobs), and any negotiated or volume tier. They also can't capture quality — a cheaper model that needs longer prompts or more retries may not be cheaper in practice. Use the ranking to shortlist, confirm the live rates on the provider's page, and weigh cost against the quality your task actually needs. For a single model's full monthly bill, pair this with our LLM Cost Calculator.

10 Facts About LLM Pricing

01

Models charge a separate input and output rate — output is almost always the pricier of the two.

02

The cheapest model for your job depends on your input/output mix, not the headline rate.

03

Flagship models can cost 10–50× more than budget models for the same request.

04

Prices are quoted per million tokens, which hides how a high-volume workload adds up.

05

Gemini 2.5 Pro charges a higher rate once a prompt exceeds 200,000 input tokens.

06

Cached-input pricing can cut the cost of a reused long prompt by up to ~90%.

07

Batch APIs typically offer around 50% off for jobs you don't need answered immediately.

08

A cheaper but weaker model isn't always cheaper — retries and longer prompts add up.

09

Model prices drift over time, so any comparison needs an "as at" date and periodic refresh.

10

This comparison runs entirely in your browser — your numbers are never uploaded.

Frequently Asked Questions

There's no single answer — it depends on your request. Enter your input and output tokens and the tool ranks every model for that exact workload. The cheapest for a long-input, short-output job is often different from the cheapest for a short-input, long-output job.
It shows how many times more expensive each model is than the cheapest option for your request. A value of 5.00× means that model costs five times what the top-ranked model costs for the same input and output tokens.
Generating tokens costs more than reading them — each output token needs a full pass through the model, while input is processed more efficiently. That's why a generation-heavy task can be much more expensive than its input size suggests.
They're list rates gathered on the date shown under the table and refreshed periodically. Providers change pricing without notice, so confirm the live rate on the provider's page before committing to a budget.
No — it uses standard list rates so the comparison is apples-to-apples. Batch processing and cached-input pricing can cut real costs significantly, but they depend on your usage pattern, so they're left out of the headline ranking.
This tool ranks the per-request cost across all models so you can choose one. The LLM Cost Calculator focuses on a single model and projects a full monthly bill from your request volume. Use this to pick, that to budget.
Not necessarily. The cheapest model may need longer prompts, produce weaker results, or require more retries — which can erase the saving. Use the ranking to shortlist, then weigh cost against the quality your task actually needs.
Gemini 2.5 Pro has two price tiers based on prompt size. When your input exceeds 200,000 tokens, the tool automatically switches it to the higher long-context rate and labels the row, so its cost stays accurate for big prompts.
No. All calculation happens in your browser. Your token counts are never sent to any server or third party, and nothing is stored.
Completely free, with no account or sign-up, and no limit on use. It runs in your browser and collects no data.

Related News

You may be interested in these recent stories from our newsroom.

No related news yet for this tool. Our editorial team publishes new pieces every week.

Browse all news →