Temperature & Top-p Explainer (Interactive Sampling Demo) | RECATOOLS

How to Use the Temperature & Top-p Explainer

Start at the defaults

Temperature 1.0, top-p 1.0 and top-k 8 show the model's raw distribution over the example candidate tokens — nothing is being filtered yet. This is your baseline to compare against.

Slide the temperature

Drag temperature toward 0 and watch the bars collapse onto the single most likely token (greedy decoding). Push it toward 2 and the bars flatten out — rare tokens like “xylophone” suddenly get real probability.

Apply top-p and top-k truncation

Lower top-p to chop off the long tail — only the tokens that make up the top “nucleus” of probability survive. Lower top-k to cap the candidate count outright. Filtered tokens grey out at 0%.

Read the renormalised result

After filtering, the surviving tokens are renormalised so their probabilities add back up to 100%. The status line tells you how many tokens remain — that's the pool the model would actually sample from.

How Sampling Knobs Reshape an LLM's Next Word

From logits to a probability distribution

Every time a large language model predicts the next token, it doesn't pick a word directly — it produces a logit (a raw, unnormalised score) for every token in its vocabulary. Those logits are turned into a probability distribution by the softmax function, which exponentiates each score and divides by the total so everything sums to one. The example in this tool uses a small, fixed set of eight candidates so you can actually see the whole distribution; a real model does the same thing across tens of thousands of tokens at once. Once you have probabilities, the question becomes: how do you choose? Sampling parameters are the answer, and the three most important are temperature, top-p and top-k.

Temperature is the first dial, and it acts before the softmax by dividing every logit by the temperature value. When temperature is high (toward 2.0), the gaps between logits shrink, the softmax output flattens, and unlikely tokens get a real shot at being chosen — the model feels adventurous, surprising, sometimes incoherent. When temperature is low (toward 0.0), the gaps stretch, probability mass piles onto the single highest-scoring token, and output becomes repetitive but safe. At the limit of temperature 0 you get greedy decoding: the model always takes its single best guess, which is why this tool treats that case as a pure argmax. Temperature doesn't add or remove candidates — it only changes how peaked or flat the curve is.

“Temperature decides how flat the curve is. Top-p and top-k decide how much of the tail you're even allowed to touch. They solve different problems, and the best decoding setups use them together.”

Truncation: top-p (nucleus) and top-k sampling

Temperature alone has a weakness: even at a sensible setting, the long tail of very unlikely tokens still carries a sliver of probability, and occasionally the model rolls the dice and picks genuine nonsense. Top-k and top-p fix this by truncating the distribution before sampling. Top-k is the blunt version: sort the tokens by probability and keep only the k highest, discarding everything else. It's simple and predictable, but a fixed k can be too generous when the model is confident and too stingy when it's uncertain. Top-p, also called nucleus sampling, is the adaptive version: walk down the sorted tokens, accumulating probability, and stop as soon as the running total reaches p. On a confident step where one token holds 95% of the mass, top-p might keep just one or two tokens; on an ambiguous step where the mass is spread thin, it keeps many. The size of the surviving “nucleus” adjusts to how sure the model is — which is exactly why nucleus sampling became the default for high-quality open-ended text.

After top-k or top-p chops off the losers, the kept tokens are renormalised so their probabilities sum back to 100% — the model never samples from a truncated set that adds up to less than one. In practice these knobs stack: you set a temperature for overall flavour, then a top-p (and sometimes a top-k cap) to fence off the garbage tail. The classic trade-off is creativity versus determinism. Crank temperature up and loosen top-p for brainstorming, fiction, or varied marketing copy; pull temperature down and tighten top-p for code, extraction, factual answers, or anything where you want the same input to give a reliable, repeatable output. There's no universally correct setting — it depends entirely on whether you want the model to surprise you or to behave. Drag the sliders above and you can watch every one of these effects happen in real time, on a distribution small enough to read at a glance.

10 Facts About Sampling

01

Models output raw logits for every token; softmax turns them into probabilities that sum to one.

02

Temperature divides the logits before softmax — it flattens or sharpens the curve without adding or removing tokens.

03

Temperature 0 is greedy decoding: the model always takes its single highest-scoring token (argmax).

04

High temperature gives rare tokens real probability — more creative, but also more incoherent.

05

Top-k keeps only the k most likely tokens and discards the rest before sampling.

06

Top-p (nucleus sampling) keeps the smallest set of tokens whose probabilities add up to at least p.

07

Top-p is adaptive: it keeps few tokens when the model is confident and many when it's unsure.

08

After truncation the surviving tokens are renormalised so their probabilities sum back to 100%.

09

Low temperature + tight top-p = repeatable, deterministic output for code and extraction.

10

This explainer is fully deterministic and runs in your browser — no model is ever called.

Frequently Asked Questions

Temperature is a number (usually 0 to 2) that divides the model's logits before the softmax. Low temperature sharpens the distribution toward the most likely token, making output focused and repeatable; high temperature flattens it, giving rare tokens more probability and making output more varied and creative.
Top-p keeps the smallest set of the most probable tokens whose cumulative probability reaches at least p, then discards the rest. Because it adapts to how peaked the distribution is, it keeps few tokens when the model is confident and many when it isn't — which is why it's the default for open-ended generation.
Top-k sorts the tokens by probability and keeps only the k highest, discarding everything else before sampling. It's simple and predictable, but a fixed k can be too loose when the model is confident and too tight when it's uncertain — which is the gap top-p was designed to close.
Top-k caps the number of candidate tokens at a fixed count; top-p caps them by cumulative probability instead. Top-k is a fixed-size window, top-p is an adaptive one. Many decoding setups apply both — a top-k as a hard ceiling and a top-p to trim the tail dynamically.
Temperature 0 is the limit of greedy decoding: the model always picks the single highest-scoring token (the argmax) and ignores all others. This makes output deterministic and repeatable, which is why it's common for code generation, classification and data extraction.
Softmax is the function that converts the model's raw logits into a probability distribution. It exponentiates each logit and divides by the sum of all the exponentials, guaranteeing every value is between 0 and 1 and that they all add up to one — the form needed before sampling.
They do different jobs, so it's common to set both. Use temperature for overall flavour — how flat or peaked the curve is — and use top-p to fence off the unlikely tail so the model can't pick nonsense. A typical creative setup raises temperature and loosens top-p; a typical factual setup lowers temperature and tightens top-p.
Greyed-out bars are candidate tokens that top-k or top-p filtered away — they're shown at 0% so you can see exactly what got cut. The remaining coloured bars are renormalised so their probabilities add back up to 100%, which is the pool the model would actually sample from.
The candidate words and base logits here are a fixed teaching example, not output from any specific model. The maths — temperature, softmax, top-k and top-p — is exactly what real models use; we just apply it to a small, readable distribution so the effects are easy to see.
No. Every calculation runs in your browser with plain JavaScript and is fully deterministic — there's no randomness, no model call, and no network request. Nothing you do is sent to a server or stored anywhere.

Related News

You may be interested in these recent stories from our newsroom.

No related news yet for this tool. Our editorial team publishes new pieces every week.

Browse all news →

Temperature & Top-p Explainer