When you interact with an LLM, you're not actually working with words, you're working with tokens. Understanding tokens and context windows is essential for writing effective prompts and understanding the model's limitations.
What is a tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use.?
A token is the basic unit that LLMs process. It's not exactly a word, but it's close:
- Short words = 1 token: "cat", "the", "is"
- Longer words = 2-3 tokens: "understanding", "artificial"
- Punctuation = 1 token: ".", ",", "!"
- Spaces often attach to words
Rule of thumb: 1 token ≈ 0.75 words (or 4 characters)
Examples
| Text | Tokens | Count |
|---|---|---|
| "Hello world" | ["Hello", " world"] | 2 |
| "Understanding" | ["Under", "standing"] | 2 |
| "ChatGPT is amazing!" | ["Chat", "G", "PT", " is", " amazing", "!"] | 6 |
Why not just use words?
Vocabulary size:
- English has ~170,000 words
- But add technical terms, names, foreign words... millions of possibilities
- Plus: "run", "runs", "running", "ran" are related but different words
Tokenization solution:
- Break words into subword units
- "understanding" → "under" + "standing"
- "ChatGPT" → "Chat" + "G" + "PT"
- Model learns patterns between these smaller pieces
This allows the model to:
- Handle rare or new words
- Understand relationships between word forms
- Work across different languages
Context windows: the model's working memory
A context windowWhat is context window?The maximum amount of text an AI model can consider at once, including your conversation history and any files it has read. is how much text (measured in tokens) the model can consider at once when generating a response.
Think of it like the model's short-term memory. It can only "remember" what's within the context window.
Context window sizes
| Model tier | Typical context window | Approximate pages |
|---|---|---|
| Smaller / faster models | 4K – 32K tokens | ~6 – 50 pages |
| Standard models | 128K – 200K tokens | ~200 – 300 pages |
| Large-context models | 500K – 1M+ tokens | ~750 – 1,500+ pages |
What counts toward the context window?
Everything in the current conversation:
- Your initial prompt
- Previous questions you've asked
- Previous answers the model gave
- Any instructions or system prompts
Context Window (8,000 tokens):
┌─────────────────────────────────────────────────┐
│ System: You are a helpful assistant │
│ │
│ User: Analyze this report... │
│ [5,000 tokens of report text] │
│ │
│ Assistant: Here's my analysis... │
│ [1,000 tokens of response] │
│ │
│ User: Can you elaborate on point 3? │
│ │
│ Assistant: Certainly... │
│ [available space: ~1,000 tokens] │
└─────────────────────────────────────────────────┘Why context windows matter
Too small = lost information
If your context exceeds the window, the model forgets the oldest parts:
Context limit: 4,000 tokens
[Old information] [Your question] [Model thinking...]
↑ ↑
Gets forgotten Model focuses hereThis is why:
- Long conversations can lose coherence
- The model might contradict itself
- Important details from earlier get "forgotten"
Strategies for limited context
1. Be concise
❌ Bad: "I need you to analyze this document and provide
a comprehensive summary of the main points..."
✅ Good: "Summarize the key findings in 3 bullet points"2. Structure your prompts
Put the most important information last (closest to where the model generates):
Background info... ← Model might forget this
[Your actual request] ← Model focuses here3. Use summaries for long documents
Instead of pasting a 50-page document, paste a 1-page summary + relevant sections.
4. Break complex tasks into steps
Step 1: "Analyze section A"
Step 2: "Now analyze section B"
Step 3: "Combine the analyses"Counting tokens
Most AI platforms show you tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. usage:
OpenAI Tokenizer: httpsWhat is https?HTTP with encryption added, so data traveling between your browser and a server can't be read or tampered with by anyone in between.://platform.openai.com/tokenizer
Tiktoken library: Python library for counting tokens
APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. responses: Include token counts in headers
Response headers:
X-Request-Token-Usage: 150
X-Response-Token-Usage: 420Cost implications
Tokens = money. Pricing is per 1,000 tokens:
Pricing is typically per million tokens, and varies widely:
| Model tier | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
|---|---|---|
| Small / fast models | $0.10 – TABLE.00 | $0.30 – $4.00 |
| Flagship models | $2.00 – TABLE5.00 | $8.00 – $60.00 |
A single request with 10K input tokens and 2K output tokens on a flagship model might cost a few cents. Multiply by thousands of daily requests and costs add up quickly.
Practical tips
Check your tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. count
- Paste text into OpenAI's tokenizer to see token count
- Monitor APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. usage dashboards
- Estimate: 750 words ≈ 1,000 tokens
Optimize for tokens
- Remove filler words
- Use abbreviations where clear
- Structure with bullet points
- Attach files instead of pasting long text
When to use large context models
Use large context models (128K+ tokens) when:
- Analyzing long documents
- Maintaining context across long conversations
- Working with codebases
- Legal document review
Standard models are fine for:
- Short Q&A
- Simple writing tasks
- Code snippets
- Most everyday use
Understanding tokens and context windows helps you work within the model's constraints. Think of the context windowWhat is context window?The maximum amount of text an AI model can consider at once, including your conversation history and any files it has read. as the model's working memory, keep your most important instructions and recent context within view, and be strategic about what you include.