Course:AI & Tools Literacy/
Lesson

You've interacted with ChatGPT, Claude, or Copilot. These tools seem almost magical, they write essays, debug code, answer questions, and even crack jokes. But what are they, really? And how do they work?

Understanding LLMs (Large Language Models) demystifies the magic and helps you use them more effectively.

The autocomplete on steroids

At its core, an LLM is an incredibly sophisticated version of the autocomplete on your phone.

When you type "The weather today is" on your phone, it might suggest "sunny" or "rainy" based on what you've typed before. An LLM does the same thing, but with an understanding built from reading billions of sentences across the entire Internet.

Instead of just suggesting one word, an LLM can generate entire paragraphs, one word at a time, by repeatedly asking: "Given what I've written so far, what's the most likely next word?"

02

Breaking down the name

Large Language Model has three parts:

Large: Trained on massive datasets (trillions of words)

  • Frontier models are trained on data from books, websites, code repositories, and more
  • Training requires thousands of powerful computers running for months
  • The "knowledge" comes from patterns in this training data

Language: Works with human language

  • Understands and generates text in many languages
  • Can switch between formal, casual, technical, or creative styles
  • Processes language similarly across different human tongues

Model: A mathematical representation

  • Neural network with billions (or trillions) of parameters
  • These parameters store "knowledge" as numerical patterns
  • Not a database of facts, but a system for generating plausible text

03

How LLMs are created: the training process

Creating an LLM happens in three major phases:

Phase 1: Pre-training (the learning phase)

The model reads vast amounts of text from the Internet:

  • Books and academic papers
  • Websites and articles
  • Code repositories (GitHub)
  • Wikipedia and encyclopedias
  • Conversations and forums

During this phase, the model learns:

  • Grammar and syntax of language
  • Facts about the world (up to its training cutoff date)
  • Reasoning patterns
  • Different writing styles and tones

The model doesn't memorize text word-for-word. Instead, it learns statistical patterns: "When people talk about weather, they often mention temperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses.," or "Questions about programming often include code examples."

Training cost
Training a frontier model can cost hundreds of millions of dollars. This includes electricity for thousands of GPUs running for months, data collection, and engineering salaries. This is why only large companies can train frontier models.

Phase 2: Fine-tuning (specialization)

The pre-trained model is good at generating text, but it needs to learn to be helpful and harmless. Fine-tuning involves:

  • Instruction tuning: Training the model to follow instructions
  • Safety training: Teaching it to refuse harmful requests
  • Preference learning: Human reviewers rate responses, and the model learns what humans prefer

Phase 3: Reinforcement Learning from Human Feedback (RLHFWhat is rlhf?Reinforcement Learning from Human Feedback - a training technique where humans rank AI responses and the model learns to generate outputs humans prefer.)

Humans compare different AI responses and rank them. The model learns to generate responses that humans prefer:

  • More helpful answers
  • More accurate information
  • Better formatting and structure
  • Appropriate tone and style

04

The key insight: prediction, not understanding

Here's the crucial mental model: LLMs don't "understand" text the way you do.

When you read "The cat sat on the mat," you visualize a cat, understand what "sat" means, and can answer questions about it because you have a mental model of the world.

An LLM processes "The cat sat on the mat" by:

  1. Breaking it into tokens
  2. Running mathematical operations on those tokens
  3. Generating a probability distribution for what comes next

It has no mental image of a cat. It has statistical patterns learned from seeing the word "cat" in millions of contexts.

This means:

  • It can generate plausible-sounding nonsense
  • It lacks true reasoning (though it mimics it well)
  • It can't verify facts against reality, only predict what sounds true

05

Why they seem so smart

If LLMs are just predicting next words, why do they seem so intelligent?

Pattern matching at scale

The training data contains countless examples of:

  • Problem-solving approaches
  • Logical arguments
  • Explanations of complex topics
  • Creative writing
  • Code solutions

When you ask a question, the model recognizes patterns in your prompt and generates text that follows similar patterns from its training.

Emergent abilities

As models get larger, they develop capabilities that weren't explicitly programmed:

  • Following complex instructions
  • Translating between languages
  • Writing code in multiple programming languages
  • Understanding context and nuance

These "emerge" from the sheer scale of training, they're not specifically taught.

CapabilitySmall ModelLarge Model
Basic grammarYesYes
Simple Q&AYesYes
Complex reasoningLimitedYes
Code generationPoorExcellent
Multi-step instructionsLimitedYes
Creative writingBasicAdvanced
06

What LLMs actually do: tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. by token

Let's see the process in action:

Input: "The capital of France is"

Step 1: Tokenize the input

["The", " capital", " of", " France", " is"]

Step 2: Model processes tokens

  • Runs them through billions of calculations
  • Activates patterns learned during training
  • Generates probabilities for next token

Step 3: Generate next token

"Paris": 99.2% probability
"Lyon": 0.3% probability
"a": 0.2% probability
...

Step 4: Select token (usually highest probability)

Output: "The capital of France is Paris"

This process repeats for every single word generated.

07

Key limitations to remember

No memory between sessions

Unless specifically configured, each conversation starts fresh. The model doesn't remember you from yesterday.

Knowledge cutoff

Models have a training date beyond which they don't know what happened. For example, a model trained on data up to early 2025 won't know about events from late 2025. Always check the model's documentation for its cutoff date.

No internet access (usually)

Most LLMs can't browse the web in real-time. They only know what was in their training data.

Confident but wrong

LLMs can generate completely false information with complete confidence. This is called "hallucinationWhat is hallucination?When an AI generates confident but false information - fabricated facts, invented citations, or non-existent code methods.."


LLMs are prediction engines, not oracles. They're incredibly useful tools that can help with writing, coding, analysis, and learning, but they require human judgment to use effectively. Understanding how they work helps you know when to trust them and when to verify.