Tokens & Context Windows: Optimize AI

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

When you interact with an LLM, you're not actually working with words, you're working with tokens. Understanding tokens and context windows is essential for writing effective prompts and understanding the model's limitations.

What is a tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use.?

A token is the basic unit that LLMs process. It's not exactly a word, but it's close:

Short words = 1 token: "cat", "the", "is"
Longer words = 2-3 tokens: "understanding", "artificial"
Punctuation = 1 token: ".", ",", "!"
Spaces often attach to words

Rule of thumb: 1 token ≈ 0.75 words (or 4 characters)

Examples

Text	Tokens	Count
"Hello world"	["Hello", " world"]	2
"Understanding"	["Under", "standing"]	2
"ChatGPT is amazing!"	["Chat", "G", "PT", " is", " amazing", "!"]	6

Why not just use words?

Vocabulary size:

English has ~170,000 words
But add technical terms, names, foreign words... millions of possibilities
Plus: "run", "runs", "running", "ran" are related but different words

Tokenization solution:

Break words into subword units
"understanding" → "under" + "standing"
"ChatGPT" → "Chat" + "G" + "PT"
Model learns patterns between these smaller pieces

This allows the model to:

Handle rare or new words
Understand relationships between word forms
Work across different languages

Different tokenizers

Each LLM uses a slightly different tokenization scheme. Different models use different tokenizers with varying vocabulary sizes (typically 50K–200K unique tokens). This is why the same text might count as different numbers of tokens in different systems.

Context windows: the model's working memory

A context windowWhat is context window?The maximum amount of text an AI model can consider at once, including your conversation history and any files it has read. is how much text (measured in tokens) the model can consider at once when generating a response.

Think of it like the model's short-term memory. It can only "remember" what's within the context window.

Context window sizes

Model tier	Typical context window	Approximate pages
Smaller / faster models	4K – 32K tokens	~6 – 50 pages
Standard models	128K – 200K tokens	~200 – 300 pages
Large-context models	500K – 1M+ tokens	~750 – 1,500+ pages

Context windows grow with every generation. Check each provider's documentation for the latest numbers.

What counts toward the context window?

Everything in the current conversation:

Your initial prompt
Previous questions you've asked
Previous answers the model gave
Any instructions or system prompts

Context Window (8,000 tokens):
┌─────────────────────────────────────────────────┐
│ System: You are a helpful assistant             │
│                                                 │
│ User: Analyze this report...                    │
│ [5,000 tokens of report text]                   │
│                                                 │
│ Assistant: Here's my analysis...                │
│ [1,000 tokens of response]                      │
│                                                 │
│ User: Can you elaborate on point 3?             │
│                                                 │
│ Assistant: Certainly...                         │
│ [available space: ~1,000 tokens]                │
└─────────────────────────────────────────────────┘

Why context windows matter

Too small = lost information

If your context exceeds the window, the model forgets the oldest parts:

Context limit: 4,000 tokens

[Old information] [Your question] [Model thinking...]
     ↑                      ↑
 Gets forgotten      Model focuses here

This is why:

Long conversations can lose coherence
The model might contradict itself
Important details from earlier get "forgotten"

Strategies for limited context

1. Be concise

❌ Bad: "I need you to analyze this document and provide
        a comprehensive summary of the main points..."
        
✅ Good: "Summarize the key findings in 3 bullet points"

2. Structure your prompts
Put the most important information last (closest to where the model generates):

Background info...      ← Model might forget this

[Your actual request]  ← Model focuses here

3. Use summaries for long documents
Instead of pasting a 50-page document, paste a 1-page summary + relevant sections.

4. Break complex tasks into steps

Step 1: "Analyze section A"
Step 2: "Now analyze section B"
Step 3: "Combine the analyses"

Counting tokens

Most AI platforms show you tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. usage:

OpenAI Tokenizer: httpsWhat is https?HTTP with encryption added, so data traveling between your browser and a server can't be read or tampered with by anyone in between.://platform.openai.com/tokenizer
Tiktoken library: Python library for counting tokens
APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. responses: Include token counts in headers

Response headers:
X-Request-Token-Usage: 150
X-Response-Token-Usage: 420

Cost implications

Tokens = money. Pricing is per 1,000 tokens:

Pricing is typically per million tokens, and varies widely:

Model tier	Input cost (per 1M tokens)	Output cost (per 1M tokens)
Small / fast models	$0.10 – TABLE.00	$0.30 – $4.00
Flagship models	$2.00 – TABLE5.00	$8.00 – $60.00

A single request with 10K input tokens and 2K output tokens on a flagship model might cost a few cents. Multiply by thousands of daily requests and costs add up quickly.

Prices drop regularly as competition increases. Always check the provider's pricing page for current rates.

Practical tips

Check your tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. count

Paste text into OpenAI's tokenizer to see token count
Monitor APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. usage dashboards
Estimate: 750 words ≈ 1,000 tokens

Optimize for tokens

Remove filler words
Use abbreviations where clear
Structure with bullet points
Attach files instead of pasting long text

When to use large context models

Use large context models (128K+ tokens) when:

Analyzing long documents
Maintaining context across long conversations
Working with codebases
Legal document review

Standard models are fine for:

Short Q&A
Simple writing tasks
Code snippets
Most everyday use

Understanding tokens and context windows helps you work within the model's constraints. Think of the context windowWhat is context window?The maximum amount of text an AI model can consider at once, including your conversation history and any files it has read. as the model's working memory, keep your most important instructions and recent context within view, and be strategic about what you include.

Done

Complete & Next