Course:AI & Tools Literacy/
Lesson

When you interact with an LLM, you're not actually working with words, you're working with tokens. Understanding tokens and context windows is essential for writing effective prompts and understanding the model's limitations.

What is a tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use.?

A token is the basic unit that LLMs process. It's not exactly a word, but it's close:

  • Short words = 1 token: "cat", "the", "is"
  • Longer words = 2-3 tokens: "understanding", "artificial"
  • Punctuation = 1 token: ".", ",", "!"
  • Spaces often attach to words

Rule of thumb: 1 token ≈ 0.75 words (or 4 characters)

Examples

TextTokensCount
"Hello world"["Hello", " world"]2
"Understanding"["Under", "standing"]2
"ChatGPT is amazing!"["Chat", "G", "PT", " is", " amazing", "!"]6

Why not just use words?

Vocabulary size:

  • English has ~170,000 words
  • But add technical terms, names, foreign words... millions of possibilities
  • Plus: "run", "runs", "running", "ran" are related but different words

Tokenization solution:

  • Break words into subword units
  • "understanding" → "under" + "standing"
  • "ChatGPT" → "Chat" + "G" + "PT"
  • Model learns patterns between these smaller pieces

This allows the model to:

  • Handle rare or new words
  • Understand relationships between word forms
  • Work across different languages

Different tokenizers
Each LLM uses a slightly different tokenization scheme. Different models use different tokenizers with varying vocabulary sizes (typically 50K–200K unique tokens). This is why the same text might count as different numbers of tokens in different systems.
02

Context windows: the model's working memory

A context windowWhat is context window?The maximum amount of text an AI model can consider at once, including your conversation history and any files it has read. is how much text (measured in tokens) the model can consider at once when generating a response.

Think of it like the model's short-term memory. It can only "remember" what's within the context window.

Context window sizes

Model tierTypical context windowApproximate pages
Smaller / faster models4K – 32K tokens~6 – 50 pages
Standard models128K – 200K tokens~200 – 300 pages
Large-context models500K – 1M+ tokens~750 – 1,500+ pages
Context windows grow with every generation. Check each provider's documentation for the latest numbers.

What counts toward the context window?

Everything in the current conversation:

  • Your initial prompt
  • Previous questions you've asked
  • Previous answers the model gave
  • Any instructions or system prompts

Context Window (8,000 tokens):
┌─────────────────────────────────────────────────┐
│ System: You are a helpful assistant             │
│                                                 │
│ User: Analyze this report...                    │
│ [5,000 tokens of report text]                   │
│                                                 │
│ Assistant: Here's my analysis...                │
│ [1,000 tokens of response]                      │
│                                                 │
│ User: Can you elaborate on point 3?             │
│                                                 │
│ Assistant: Certainly...                         │
│ [available space: ~1,000 tokens]                │
└─────────────────────────────────────────────────┘
03

Why context windows matter

Too small = lost information

If your context exceeds the window, the model forgets the oldest parts:

Context limit: 4,000 tokens

[Old information] [Your question] [Model thinking...]
     ↑                      ↑
 Gets forgotten      Model focuses here

This is why:

  • Long conversations can lose coherence
  • The model might contradict itself
  • Important details from earlier get "forgotten"

Strategies for limited context

1. Be concise

❌ Bad: "I need you to analyze this document and provide
        a comprehensive summary of the main points..."
        
✅ Good: "Summarize the key findings in 3 bullet points"

2. Structure your prompts
Put the most important information last (closest to where the model generates):

Background info...      ← Model might forget this

[Your actual request]  ← Model focuses here

3. Use summaries for long documents
Instead of pasting a 50-page document, paste a 1-page summary + relevant sections.

4. Break complex tasks into steps

Step 1: "Analyze section A"
Step 2: "Now analyze section B"
Step 3: "Combine the analyses"

04

Counting tokens

Most AI platforms show you tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. usage:

OpenAI Tokenizer: httpsWhat is https?HTTP with encryption added, so data traveling between your browser and a server can't be read or tampered with by anyone in between.://platform.openai.com/tokenizer
Tiktoken library: Python library for counting tokens
APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. responses: Include token counts in headers

Response headers:
X-Request-Token-Usage: 150
X-Response-Token-Usage: 420

Cost implications

Tokens = money. Pricing is per 1,000 tokens:

Pricing is typically per million tokens, and varies widely:

Model tierInput cost (per 1M tokens)Output cost (per 1M tokens)
Small / fast models$0.10 – TABLE.00$0.30 – $4.00
Flagship models$2.00 – TABLE5.00$8.00 – $60.00

A single request with 10K input tokens and 2K output tokens on a flagship model might cost a few cents. Multiply by thousands of daily requests and costs add up quickly.

Prices drop regularly as competition increases. Always check the provider's pricing page for current rates.
05

Practical tips

Check your tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. count

  • Paste text into OpenAI's tokenizer to see token count
  • Monitor APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. usage dashboards
  • Estimate: 750 words ≈ 1,000 tokens

Optimize for tokens

  • Remove filler words
  • Use abbreviations where clear
  • Structure with bullet points
  • Attach files instead of pasting long text

When to use large context models

Use large context models (128K+ tokens) when:

  • Analyzing long documents
  • Maintaining context across long conversations
  • Working with codebases
  • Legal document review

Standard models are fine for:

  • Short Q&A
  • Simple writing tasks
  • Code snippets
  • Most everyday use


Understanding tokens and context windows helps you work within the model's constraints. Think of the context windowWhat is context window?The maximum amount of text an AI model can consider at once, including your conversation history and any files it has read. as the model's working memory, keep your most important instructions and recent context within view, and be strategic about what you include.