Token

Latest update: 26/04/27

Definition

A token is the small chunk of text – roughly a word or part of a word – that AI language models use to read and generate language. Everything you type and everything the AI writes gets broken into tokens first.

What Is a Token?

A token is the basic unit an AI language model works with. Not letters, not words – tokens. Most tokens are common words or word fragments. “Dog” is one token. “Running” might be one token. “Unbelievable” could be split into two or three. Numbers, punctuation, and spaces all count too.

You never see this happening – the AI handles it automatically behind the scenes. But tokens are what the model actually reads and writes, one at a time. They’re the alphabet of AI language processing.

Why not just use whole words? Because breaking language into sub-word pieces lets the model handle rare words, names, technical terms, and other languages without needing a separate entry for every possible word that has ever existed.

💡 How Does It Work?

Before any processing starts, your prompt gets passed through a tokenizer – a converter that splits text into tokens and maps each one to a number. The model then works with those numbers, not the original text.

Think of it like Morse code. Before you can transmit a message, you convert each letter into dots and dashes. The receiver decodes it back at the other end. Tokenization works the same way – text goes in, numbers come out, the model works with the numbers, then converts back to readable text when it responds.

Common words like “the” or “and” are usually a single token. Long or rare words get split: “tokenization” might become “token” + “ization.” Spaces, line breaks, and punctuation each consume tokens too.

Why It Matters for Your Prompts

Tokens are the currency of AI. They set limits on what you can send and receive, and they determine cost when you’re using a paid API.

Every model has a context window – the maximum number of tokens it can process at once. That includes your prompt, any documents you paste in, the conversation history, and the AI’s response. When you hit that limit, the model can’t see earlier parts of the conversation anymore. Long threads can cause the AI to “forget” instructions you gave at the start.

For everyday users, this mostly shows up as unexpected truncation or responses that seem to ignore earlier context. For developers paying per token, it shows up directly in billing.

Knowing tokens exist also helps you write tighter prompts. Vague filler text – long preambles, repeated instructions, unnecessary pleasantries – all eat into your context window without improving the output. Concise prompts tend to outperform wordy ones, and now you know part of the reason why.

🌐 Real-World Example

A user pastes a 20-page contract into an AI tool and asks it to summarize the key clauses. The summary comes back oddly incomplete – it covers the first half well but glosses over the final sections.

The problem: the contract plus the conversation history pushed close to the model’s token limit. The AI processed as much as it could but ran out of context space before reaching the end of the document.

The fix: split the document into sections and summarize each one separately. Same AI, same contract – just managed in token-sized pieces instead of all at once.

Related Terms

Context Window – The total number of tokens an AI can hold in memory at once; tokens fill this window.
Large Language Model (LLM) – The type of AI that processes everything through tokens.
Prompt – Everything you type gets converted into tokens before the model reads it.
Inference – The process of generating output tokens in response to input tokens.
Temperature – Controls how the model chooses which token to generate next.

Encyclopedia Prompting Techniques Architecture & Technical Advanced Concepts

Frequently Asked Questions

How many tokens is a typical prompt?

A rough rule of thumb: 1 token ≈ 0.75 words in English. So 100 words ≈ about 133 tokens. A short prompt of a few sentences might use 50–100 tokens. A pasted article could use thousands. Most AI tools show you token counts somewhere in their settings or API dashboards.

Does using more tokens always mean better results?

Not at all. Longer prompts aren’t better prompts. Padding your prompt with redundant context, over-explaining, or repeating instructions can actually dilute the signal and push useful context out of the window. Clarity beats length almost every time.

Why does AI handle some languages worse than others?

Tokenization is part of the reason. Most LLMs were trained primarily on English, so English words map efficiently to tokens – often one token per common word. Languages with different scripts or more complex word structures can require far more tokens to express the same idea. That means less fits in the context window, and training data for those languages was usually thinner to begin with.

Are tokens the same thing as words?

No, though they’re close enough that you can use words as a rough proxy. The difference matters most with long documents, non-English text, or when you’re managing costs. In English prose, tokens and words are often within 25–30% of each other – but code, punctuation-heavy text, or languages like Chinese or Arabic can throw that ratio off significantly.

References

OpenAI – Tokenizer Tool – Interactive tool to see exactly how text gets broken into tokens.
Hugging Face – Summary of the Tokenizers” (huggingface.co/docs)