Text Chunker for RAG & Embeddings

How to Use the Text Chunker

Paste your document

Drop in the text you want to split. It's tokenized with the exact GPT tokenizer so chunk boundaries land on real token edges, not mid-token.

Set chunk size and overlap

Choose how many tokens each chunk holds and how many tokens adjacent chunks share. A common RAG starting point is 512 tokens with 64 tokens of overlap.

Review the chunks

See the total tokens, chunk count and each chunk's text. Tune the size and overlap until the splits make sense for your content.

Copy as JSON

Export the chunks as a JSON array, ready to feed into an embedding pipeline or vector store. Everything stays in your browser.

Chunking: The Quiet Decision That Makes or Breaks RAG

Why documents get chunked at all

Retrieval-augmented generation (RAG) works by embedding your documents into vectors, storing them, and at query time retrieving the most relevant pieces to feed the model. But you can't embed a whole book as one vector — embedding models have input limits, and a single giant vector blurs everything together so retrieval becomes useless. So documents are split into chunks: passages small enough to embed cleanly and specific enough that a match means something. Chunk by tokens rather than characters or words, because tokens are the unit the embedding model and the downstream LLM both actually measure, so a token-sized chunk reliably fits the budget.

The two dials that matter are size and overlap. Smaller chunks give precise retrieval but can lose surrounding context; larger chunks keep context but dilute relevance and cost more to embed. Overlap — repeating a slice of tokens between neighbouring chunks — stops an idea that straddles a boundary from being cut in half, so a sentence split across two chunks still appears whole in at least one. There's no universal best setting; it depends on your content and your model. A common, sensible starting point is around 512 tokens per chunk with 10–15% overlap, then tune from there.

"Retrieval can only return what chunking preserved. Split badly and the right answer is sitting in a chunk your search will never surface."

Why exact token boundaries matter

This tool chunks on real tiktoken boundaries, so each chunk's token count is exact and never overshoots your size limit — important when your embedding model has a hard input cap, or when you're packing retrieved chunks back into a context window and can't afford a surprise overflow. It uses the sliding-window approach (advance by size minus overlap) that underpins most RAG libraries, and lets you export the result as JSON to drop straight into a pipeline. Estimate the resulting cost with our Embedding Cost Calculator, and check how the retrieved chunks stack up against a model's limit with the Context Window Visualizer.

10 Facts About Chunking

01

RAG splits documents into chunks so each can be embedded and retrieved precisely.

02

Chunking by tokens (not characters) matches the unit embedding models actually measure.

03

Smaller chunks retrieve precisely but can lose surrounding context.

04

Larger chunks keep context but dilute relevance and cost more to embed.

05

Overlap repeats tokens between chunks so an idea on a boundary isn't cut in half.

06

A common starting point is ~512 tokens per chunk with 10–15% overlap.

07

The sliding window advances by chunk size minus overlap each step.

08

Overlap means some text is embedded twice — raising the total token cost.

09

Retrieval can only return what chunking preserved — bad splits hide good answers.

10

This chunker runs entirely in your browser — your text is never uploaded.

Frequently Asked Questions

It's splitting a document into smaller passages so each can be embedded into a vector and retrieved independently. Retrieval-augmented generation depends on it: well-sized chunks make search precise, while a whole document embedded as one vector retrieves poorly.
A common starting point is around 512 tokens per chunk with 10–15% overlap (about 50–75 tokens). There's no universal best — denser, more technical content often benefits from smaller chunks, while narrative content tolerates larger ones. Tune by testing retrieval quality.
Tokens are the unit embedding models and LLMs actually measure and limit. Chunking by tokens guarantees each chunk fits the model's input budget exactly, whereas character or word counts only approximate it and can overshoot the real token limit.
Overlap repeats a slice of tokens between adjacent chunks so context that straddles a boundary isn't lost. A sentence split across two chunks will still appear complete in at least one of them, which improves retrieval at the cost of embedding a little text twice.
OpenAI's tiktoken with the o200k_base encoding (used by GPT-5 and GPT-4o). That gives exact token boundaries for OpenAI models. Other models tokenise slightly differently, but o200k is a solid, widely-used basis for chunk sizing across the board.
Yes. Click "Copy chunks as JSON" to get a JSON array of strings you can paste straight into an embedding script or vector-store loader. For very large documents the preview is capped, but the chunk count stays exact.
Yes, slightly. Overlapping tokens are embedded in more than one chunk, so total tokens exceed the raw document. It's usually a worthwhile trade for better retrieval — use our Embedding Cost Calculator with the chunk count and size here to see the exact impact.
It can help. This tool uses fixed-size token windows, which is the standard and most predictable approach. Sentence- or paragraph-aware splitting avoids mid-sentence cuts but produces uneven chunk sizes; many pipelines combine both. Fixed-size with overlap is a strong, reliable default.
No. The tokenizer and chunking run entirely in your browser from a locally-served library. Your text is never uploaded; the only network request is your browser fetching the tokenizer file from our own domain.
Completely free, with no account or sign-up, and no limit on use. It runs in your browser and collects no data.

Related News

You may be interested in these recent stories from our newsroom.

No related news yet for this tool. Our editorial team publishes new pieces every week.

Browse all news →

Text Chunker for RAG and Embeddings

How to Use the Text Chunker

Paste your document

Set chunk size and overlap

Review the chunks

Copy as JSON

Chunking: The Quiet Decision That Makes or Breaks RAG

Why documents get chunked at all

Why exact token boundaries matter

10 Facts About Chunking

Frequently Asked Questions

Related News

Related Tools

Agent Skill Scaffolder

AI & LLM Glossary

AI Coding Rules Builder

AI Image Aspect Ratio Calculator

75 more free tools