Text Chunker for RAG & Embeddings
Split long text into token-sized chunks with overlap for RAG and embeddings, using exact GPT tokenization. Copy as JSON. Free, runs in your browser.
Text Chunker for RAG and Embeddings
How to Use the Text Chunker
Paste your document
Drop in the text you want to split. It's tokenized with the exact GPT tokenizer so chunk boundaries land on real token edges, not mid-token.
Set chunk size and overlap
Choose how many tokens each chunk holds and how many tokens adjacent chunks share. A common RAG starting point is 512 tokens with 64 tokens of overlap.
Review the chunks
See the total tokens, chunk count and each chunk's text. Tune the size and overlap until the splits make sense for your content.
Copy as JSON
Export the chunks as a JSON array, ready to feed into an embedding pipeline or vector store. Everything stays in your browser.
Chunking: The Quiet Decision That Makes or Breaks RAG
Why documents get chunked at all
Retrieval-augmented generation (RAG) works by embedding your documents into vectors, storing them, and at query time retrieving the most relevant pieces to feed the model. But you can't embed a whole book as one vector — embedding models have input limits, and a single giant vector blurs everything together so retrieval becomes useless. So documents are split into chunks: passages small enough to embed cleanly and specific enough that a match means something. Chunk by tokens rather than characters or words, because tokens are the unit the embedding model and the downstream LLM both actually measure, so a token-sized chunk reliably fits the budget.
The two dials that matter are size and overlap. Smaller chunks give precise retrieval but can lose surrounding context; larger chunks keep context but dilute relevance and cost more to embed. Overlap — repeating a slice of tokens between neighbouring chunks — stops an idea that straddles a boundary from being cut in half, so a sentence split across two chunks still appears whole in at least one. There's no universal best setting; it depends on your content and your model. A common, sensible starting point is around 512 tokens per chunk with 10–15% overlap, then tune from there.
"Retrieval can only return what chunking preserved. Split badly and the right answer is sitting in a chunk your search will never surface."
Why exact token boundaries matter
This tool chunks on real tiktoken boundaries, so each chunk's token count is exact and never overshoots your size limit — important when your embedding model has a hard input cap, or when you're packing retrieved chunks back into a context window and can't afford a surprise overflow. It uses the sliding-window approach (advance by size minus overlap) that underpins most RAG libraries, and lets you export the result as JSON to drop straight into a pipeline. Estimate the resulting cost with our Embedding Cost Calculator, and check how the retrieved chunks stack up against a model's limit with the Context Window Visualizer.
10 Facts About Chunking
RAG splits documents into chunks so each can be embedded and retrieved precisely.
Chunking by tokens (not characters) matches the unit embedding models actually measure.
Smaller chunks retrieve precisely but can lose surrounding context.
Larger chunks keep context but dilute relevance and cost more to embed.
Overlap repeats tokens between chunks so an idea on a boundary isn't cut in half.
A common starting point is ~512 tokens per chunk with 10–15% overlap.
The sliding window advances by chunk size minus overlap each step.
Overlap means some text is embedded twice — raising the total token cost.
Retrieval can only return what chunking preserved — bad splits hide good answers.
This chunker runs entirely in your browser — your text is never uploaded.
Frequently Asked Questions
- It's splitting a document into smaller passages so each can be embedded into a vector and retrieved independently. Retrieval-augmented generation depends on it: well-sized chunks make search precise, while a whole document embedded as one vector retrieves poorly.
- A common starting point is around 512 tokens per chunk with 10–15% overlap (about 50–75 tokens). There's no universal best — denser, more technical content often benefits from smaller chunks, while narrative content tolerates larger ones. Tune by testing retrieval quality.
- Tokens are the unit embedding models and LLMs actually measure and limit. Chunking by tokens guarantees each chunk fits the model's input budget exactly, whereas character or word counts only approximate it and can overshoot the real token limit.
- Overlap repeats a slice of tokens between adjacent chunks so context that straddles a boundary isn't lost. A sentence split across two chunks will still appear complete in at least one of them, which improves retrieval at the cost of embedding a little text twice.
- OpenAI's tiktoken with the o200k_base encoding (used by GPT-5 and GPT-4o). That gives exact token boundaries for OpenAI models. Other models tokenise slightly differently, but o200k is a solid, widely-used basis for chunk sizing across the board.
- Yes. Click "Copy chunks as JSON" to get a JSON array of strings you can paste straight into an embedding script or vector-store loader. For very large documents the preview is capped, but the chunk count stays exact.
- Yes, slightly. Overlapping tokens are embedded in more than one chunk, so total tokens exceed the raw document. It's usually a worthwhile trade for better retrieval — use our Embedding Cost Calculator with the chunk count and size here to see the exact impact.
- It can help. This tool uses fixed-size token windows, which is the standard and most predictable approach. Sentence- or paragraph-aware splitting avoids mid-sentence cuts but produces uneven chunk sizes; many pipelines combine both. Fixed-size with overlap is a strong, reliable default.
- No. The tokenizer and chunking run entirely in your browser from a locally-served library. Your text is never uploaded; the only network request is your browser fetching the tokenizer file from our own domain.
- Completely free, with no account or sign-up, and no limit on use. It runs in your browser and collects no data.
Related News
You may be interested in these recent stories from our newsroom.
No related news yet for this tool. Our editorial team publishes new pieces every week.
Browse all news →75 more free tools
Calculators, converters, security tools — no signup.