Home › Encyclopedia › Architecture & Technical

Logits

Latest update: 26/04/30

Definition

Logits are the raw, unnormalized scores an AI model produces for every possible next token before deciding what to generate – the internal “vote tallies” that get converted into probabilities and then into actual words.

What Are Logits?

Every time an AI model is about to generate the next token, it produces a score for every token in its vocabulary – tens of thousands of them. These raw scores are logits. They represent how strongly the model “favors” each possible next token given everything it’s read so far.

Logits aren’t probabilities yet. They’re raw values – some high, some low, some negative – before any normalization happens. They get converted into probabilities (via a function called softmax), and then the final token gets selected from those probabilities.

The word “logit” comes from statistics – it’s shorthand for log-odds, a way of expressing probabilities on an unbounded scale. In everyday AI use, you’ll encounter the term most often in API documentation, research papers, and developer tools.

💡 How Does It Work?

After processing your prompt, the model performs a final calculation that produces one logit value for every token in its vocabulary. A vocabulary might contain 50,000–100,000 tokens.

Those logit values get passed through a softmax function, which converts them into a proper probability distribution – all values between 0 and 1, summing to 1. Then settings like temperature and Top-P are applied to that distribution before the final token is sampled.

Think of it like a panel of judges scoring every contestant at once. Each contestant (token) gets a raw score. Those raw scores get normalized into rankings. Then a winner gets selected – with some randomness introduced by temperature and Top-P to keep things from always going to the same winner.

Logits are the raw scores before any of that normalization and selection happens.

Why It Matters for Your Prompts

Most users never interact with logits directly – they’re an internal step in generation. But they matter in a few practical contexts.

For developers, API access to logit values (called “logprobs” in some APIs) tells you how confident the model was about each token it generated. A token chosen with very high probability (logit far above all alternatives) is a strong, unambiguous prediction. A token chosen from a nearly flat distribution (many tokens with similar logits) indicates more uncertainty – and is more likely to vary across runs.

This matters for reliability. If you’re building applications where consistency is important – classification tasks, structured output, yes/no decisions – checking the top logit probabilities tells you whether the model is sure of its answer or just guessing from a crowded field.

For prompt writers, the practical implication is indirect: the cleaner and more specific your prompt, the more the relevant tokens dominate the logit distribution – and the more consistent and reliable the output becomes.

🌐 Real-World Example

A developer is building a sentiment classifier. The model reads customer reviews and outputs “Positive,” “Negative,” or “Neutral.” She enables logprob logging in her API calls to check confidence.

Most reviews return “Positive” with a logit probability above 0.95 – the model is very sure. But a batch of mixed reviews – ones that mention both good and bad aspects – returns “Neutral” with only 0.55 probability, with “Positive” close behind at 0.38.

She flags those low-confidence outputs for human review rather than trusting the classification automatically. The logits didn’t change the model’s output – they told her which outputs to trust and which to check.

Related Terms

Temperature – Temperature is applied to the logit distribution before sampling – scaling scores to make the distribution flatter or steeper.
Top-P (Nucleus Sampling) – Top-P filtering is applied after logits are converted to probabilities, cutting off low-probability tokens before sampling.
Inference – Logits are produced during inference at the final step before each token is selected.
Token – The model produces one logit score per token in its vocabulary at each generation step.
Hallucination – Hallucinations often correlate with low logit confidence – the model is generating from a flat, uncertain distribution rather than a confident, peaked one.

Encyclopedia Fundamental Prompting Techniques Advanced Concepts

Frequently Asked Questions

Do I need to understand logits to use AI effectively?

For everyday prompting, no. Logits are an internal mechanism you don’t interact with directly. They become relevant if you’re building AI applications that need confidence scoring, or if you’re reading research papers and API documentation where the term appears. Knowing what they are prevents confusion – but it won’t change how you write most prompts.

What are “logprobs” in AI APIs?

Logprobs – short for log probabilities – are a feature in some APIs (including OpenAI’s) that returns the top token probabilities for each position in a generation. They’re derived from logits and give you a window into the model’s confidence at each step. Useful for classification tasks, anomaly detection, and any scenario where you need to know how certain the model was about what it said.

Can I tell if an AI is uncertain by looking at its output?

Usually not from the text alone. A model can generate a confident-sounding sentence from a very uncertain logit distribution – the text gives no indication of the underlying probability spread. Logprobs are the only reliable way to check this programmatically. Otherwise, prompting the model to express its own uncertainty (“how confident are you in this?”) gives a rough signal, though it’s imperfect.

Are logits specific to language models?

No – logits appear in many types of neural networks, including image classifiers and other models. Any neural network that produces a score for each possible output category before a final classification step is producing logits. The term just describes the pre-normalization scores at the output layer, regardless of the model type or task.

References

OpenAI Developers: Using logprobs
Jurafsky, D. & Martin, J.H. – Speech and Language Processing