Sampling Parameters Cheat Sheet

Share:

Searchable reference of LLM sampling parameters: temperature, top-p, top-k, penalties, seed, min-p and more, with ranges and effects. Free, in-browser.

RT-AI-037 · AI Tools

Sampling Parameters Cheat Sheet

Advertisement
After tool · AD-W1Responsive · Post-tool

How to Use the Sampling Parameters Cheat Sheet

Browse the cards

Each card covers one generation parameter — its other names, a plain-English description, the typical value range, what a low versus a high value does, and a practical "when to use" tip.

Search to narrow down

Type into the search box to filter cards instantly by name, alias or description — for example "nucleus" finds top-p, and "penalty" surfaces the three penalty parameters at once.

Compare low vs high

Read the low and high rows side by side to understand the trade-off each knob makes — usually focus and determinism on one end, diversity and creativity on the other.

Apply the tip and reset

Use the "when to use" tip as a starting point in your API call or playground, then press Reset to clear the search and explore the rest. Everything runs in your browser.

Advertisement
After how-to · AD-W2Responsive

Sampling Parameters Are the Steering Wheel of an LLM

How a language model turns probabilities into words

A large language model does not "decide" what to say in one step. At every position it produces a probability distribution over its entire vocabulary — tens of thousands of possible next tokens, each with a score. Sampling parameters (also called decoding or generation parameters) are the controls that turn that raw distribution into a single chosen token, over and over, until the response is complete. They do not change what the model knows; they change how it picks from what it knows. That is why two requests with the identical prompt can produce a dry, deterministic answer or a wildly inventive one — the difference is entirely in the sampling settings.

The most familiar control is temperature, which stretches or flattens the distribution. Low temperature sharpens it so the top token wins almost every time, giving focused, repeatable output ideal for code and extraction. High temperature flattens it, letting unlikely tokens surface, which is what you want for brainstorming and fiction. Sitting alongside temperature are the truncation methods: top-p (nucleus sampling) keeps the smallest set of tokens whose probabilities add up to p, while top-k keeps a fixed number of the most likely candidates. Min-p is a newer, adaptive cousin that scales its cutoff to the model's confidence. These all answer the same question — which tokens are even allowed to be picked — but with different shapes of cutoff, which is why mixing several aggressively can fight each other.

"Temperature, top-p and top-k all shape the same distribution. Reach for one as your main dial, and treat the rest as fine-tuning — not three knobs to crank at once."

Penalties, stops, seeds and the rest of the family

Beyond choosing tokens, a second group of parameters manages repetition and control. The frequency penalty scales down a token the more often it has already appeared; the presence penalty applies a flat nudge to anything used even once, pushing the model toward new topics; and the multiplicative repetition penalty, common in open-source stacks, discourages reusing recent tokens. Used gently these cure loops and verbatim echoing; pushed too hard they wreck grammar by forcing awkward synonyms. Stop sequences end generation cleanly when a chosen string appears — invaluable for keeping structured or turn-based output from bleeding into the next role label or code fence — and max tokens caps the length of the answer, bounding both cost and latency.

Rounding out the family are the reproducibility and steering controls. A fixed seed pins the random number generator so the same inputs reproduce the same output, which is essential when you want prompt differences — not randomness — to explain a change in results. Logit bias lets you nudge specific tokens up or down, even banning a word outright or forcing a fixed vocabulary. And n asks the API for several independent completions at once, so you can generate a handful of drafts and pick the best. None of these settings is universally "correct": the right combination depends on whether you value precision or creativity, speed or thoroughness, repeatability or variety. The cheat sheet above lays each one out so you can pick deliberately instead of leaving the defaults to chance — and because every parameter interacts with the others, understanding them as a family is what separates guesswork from genuine control over a model's output.

10 Facts About Sampling Parameters

01

Sampling parameters change how a model picks tokens, not what it knows — same weights, very different output.

02

Temperature 0 is effectively greedy decoding: the highest-probability token wins almost every time.

03

Top-p cuts by cumulative probability mass; top-k cuts by a fixed count — different shapes of the same idea.

04

Adjusting temperature and top-p together is discouraged — the two interact and can fight each other.

05

Frequency penalty scales with repeats; presence penalty is a flat one-time nudge toward new topics.

06

Repetition penalty above ~1.2 can break grammar by forcing the model into awkward synonyms.

07

A fixed seed makes a generation reproducible — perfect for testing prompt changes in isolation.

08

Logit bias of -100 effectively bans a token; a large positive bias can force one to appear.

09

Min-p scales its cutoff to the model's confidence, so it pairs nicely with higher temperatures.

10

Setting n > 1 returns several drafts in one call — combine it with higher temperature for real variety.

Frequently Asked Questions

  • They are the controls that decide how a language model turns its probability distribution over possible next tokens into the actual words it generates. Temperature, top-p, top-k, penalties, seed and others all shape this token-by-token choice without changing what the model knows.
  • Temperature reshapes the whole distribution — low values sharpen it toward the most likely token, high values flatten it. Top-p instead truncates the distribution, keeping only the smallest set of tokens whose probabilities sum to p. Most guidance is to tune one as your main dial, not both at once.
  • Top-k keeps a fixed number of candidate tokens; top-p keeps a probability-mass slice that grows and shrinks with the model's confidence. Top-p adapts better to context, which is why many modern stacks favour it, but both are valid and some APIs let you combine them.
  • As a rough guide: 0–0.3 for code, extraction and factual answers where you want consistency; 0.7–1.0 for general chat; and 1.0–1.3 for brainstorming or creative writing. Start in the middle and adjust based on whether output feels too rigid or too random.
  • The frequency penalty grows with the number of times a token has already appeared, fighting word-level repetition. The presence penalty applies a single flat penalty to any token used at least once, nudging the model toward entirely new topics regardless of how often a word recurred.
  • A seed fixes the random number generator that drives sampling, so the same prompt and parameters reproduce the same output. It is invaluable for testing and debugging, because any change in the result then comes from your edits rather than from random variation. Omit it in production for natural variety.
  • A stop sequence is a string that halts generation the moment it appears, with the stop text itself excluded from the output. They keep structured or turn-based responses from running on — for example stopping at a new role label, a blank line, or a closing code fence.
  • The core ideas are shared, but names and availability vary. Some APIs expose temperature and top-p but not top-k; open-source stacks often add repetition penalty and min-p; and value ranges differ between providers. Always check the specific API's documentation before assuming a default.
  • No. The reference data lives in the page and the search filters the cards entirely in your browser with plain JavaScript. Nothing you type is sent to any server, model or third party, and nothing is stored.
  • Completely free, with no account or sign-up and no usage limit. It runs in your browser and collects no data.

Related News

You may be interested in these recent stories from our newsroom.

No related news yet for this tool. Our editorial team publishes new pieces every week.

Browse all news →
Advertisement
Pre-footer · AD-W3 728 × 90

75 more free tools

Calculators, converters, security tools — no signup.