Embedding Cost Calculator

How to Use the Embedding Cost Calculator

Pick an embedding model

Choose from OpenAI, Google, Voyage, Cohere or Mistral embedding models. Its per-million-token price appears under the dropdown.

Enter your corpus size

Put in how many documents or chunks you'll embed, and the average tokens in each. For RAG, the "chunk" is usually a few hundred tokens.

Read the total

The tool multiplies documents × average tokens to get total tokens, then applies the model's rate for the full embedding cost plus a per-document figure.

Plan re-embeds

Remember you pay again to re-embed if you change models or chunking. Switch models here to compare the cost of an alternative before you commit.

What It Costs to Embed a Corpus

Embeddings are cheap per token — but corpora are large

Embedding models turn text into vectors for semantic search and retrieval-augmented generation (RAG), and they are priced per input token like any other model — but usually far cheaper than chat models, often a few cents per million tokens. The catch is scale: a knowledge base, product catalogue or documentation site can run to hundreds of millions of tokens once chunked. At that size even a low per-token rate adds up, and the difference between a $0.02 and a $0.13 model is the difference between a few dollars and a few tens of dollars to index the same corpus. Knowing the number up front lets you choose a model deliberately rather than discover the bill after the import job runs.

The structure is simple: documents × average tokens per document = total tokens, and total tokens × the model's rate = your cost. The variable people forget is chunking. RAG pipelines split documents into overlapping chunks, and overlap means the same text is embedded more than once — so your real token count is higher than the raw corpus. Estimate the average chunk size and chunk count, not the original document count, for an accurate figure.

"Embedding is cheap per token and expensive per mistake. The costly part isn't the first run — it's re-embedding the whole corpus because you changed the model."

The cost you'll forget: re-embedding

The biggest hidden cost in a vector pipeline is that embeddings are model-specific. Vectors from one model can't be compared with vectors from another, so the day you switch embedding models — for better quality, a new release, or a cheaper option — you pay to re-embed the entire corpus from scratch. The same applies if you change your chunking strategy. That's why the per-document figure here matters: multiply it by your full corpus to see the true cost of a migration before you start one. Pair this with our Text Chunker to estimate realistic chunk counts, and the Token Counter to measure average chunk size.

10 Facts About Embedding Costs

01

Embedding models are priced per input token, usually far cheaper than chat models.

02

Cost = documents × avg tokens × rate — scale, not the per-token price, is what bites.

03

OpenAI's text-embedding-3-small is among the cheapest at about $0.02 per 1M tokens.

04

Embeddings are model-specific — you can't mix vectors from two different models.

05

Switching models means re-embedding the whole corpus — often the biggest hidden cost.

06

Chunk overlap embeds some text more than once, so real token counts exceed the raw corpus.

07

Only the input is charged — embeddings have no "output tokens" to pay for.

08

Many providers offer a batch tier at around half price for non-urgent embedding jobs.

09

Larger embedding models cost more per token but can improve retrieval quality.

10

This calculator runs entirely in your browser — your numbers are never uploaded.

Frequently Asked Questions

Multiply the number of documents or chunks by the average tokens in each to get total tokens, then multiply by the model's per-million-token rate. Embeddings only charge for input, so there's no output cost to add.
Count chunks. RAG pipelines split documents into smaller overlapping pieces before embedding, so the chunk count and average chunk size give a far more accurate token total than the original document count.
Overlap repeats some text across adjacent chunks so context isn't lost at the boundaries. That repeated text is embedded — and billed — more than once, so a corpus with overlap costs more than its raw token count suggests.
No. An embedding model returns a fixed-size vector, not generated text, so you only pay for the input tokens you send. That's one reason embeddings are much cheaper than chat completions.
Embeddings are model-specific — vectors from different models live in different spaces and can't be compared. So if you switch embedding models or change your chunking, you must re-embed the entire corpus, paying the full cost again. Use the per-document figure to size that migration in advance.
Among the listed options, OpenAI's text-embedding-3-small and Voyage's lite tier are the lowest per-token. But cheaper isn't always better — a higher-quality model can improve retrieval enough to be worth the extra cost, especially since embedding is a one-time-per-corpus expense.
This tool sizes the one-time cost of embedding your corpus. At query time you also embed each incoming search query, but those are tiny (a few dozen tokens) and usually negligible next to the corpus. Add them in if your query volume is very high.
They're list rates gathered on the date in the disclaimer and refreshed periodically. Providers retire and reprice embedding models over time, so confirm the live rate before budgeting a large indexing job.
No. All calculation happens in your browser. Your numbers are never sent to any server or third party, and nothing is stored.
Completely free, with no account or sign-up, and no limit on use. It runs in your browser and collects no data.

Related News

You may be interested in these recent stories from our newsroom.

No related news yet for this tool. Our editorial team publishes new pieces every week.

Browse all news →