LLM Fundamentals: How AI Models Work

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

You've interacted with ChatGPT, Claude, or Copilot. These tools seem almost magical, they write essays, debug code, answer questions, and even crack jokes. But what are they, really? And how do they work?

Understanding LLMs (Large Language Models) demystifies the magic and helps you use them more effectively.

The autocomplete on steroids

At its core, an LLM is an incredibly sophisticated version of the autocomplete on your phone.

When you type "The weather today is" on your phone, it might suggest "sunny" or "rainy" based on what you've typed before. An LLM does the same thing, but with an understanding built from reading billions of sentences across the entire Internet.

Instead of just suggesting one word, an LLM can generate entire paragraphs, one word at a time, by repeatedly asking: "Given what I've written so far, what's the most likely next word?"

Breaking down the name

Large Language Model has three parts:

Large: Trained on massive datasets (trillions of words)

Frontier models are trained on data from books, websites, code repositories, and more
Training requires thousands of powerful computers running for months
The "knowledge" comes from patterns in this training data

Language: Works with human language

Understands and generates text in many languages
Can switch between formal, casual, technical, or creative styles
Processes language similarly across different human tongues

Model: A mathematical representation

Neural network with billions (or trillions) of parameters
These parameters store "knowledge" as numerical patterns
Not a database of facts, but a system for generating plausible text

How LLMs are created: the training process

Creating an LLM happens in three major phases:

Phase 1: Pre-training (the learning phase)

The model reads vast amounts of text from the Internet:

Books and academic papers
Websites and articles
Code repositories (GitHub)
Wikipedia and encyclopedias
Conversations and forums

During this phase, the model learns:

Grammar and syntax of language
Facts about the world (up to its training cutoff date)
Reasoning patterns
Different writing styles and tones

The model doesn't memorize text word-for-word. Instead, it learns statistical patterns: "When people talk about weather, they often mention temperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses.," or "Questions about programming often include code examples."

Training cost

Training a frontier model can cost hundreds of millions of dollars. This includes electricity for thousands of GPUs running for months, data collection, and engineering salaries. This is why only large companies can train frontier models.

Phase 2: Fine-tuning (specialization)

The pre-trained model is good at generating text, but it needs to learn to be helpful and harmless. Fine-tuning involves:

Instruction tuning: Training the model to follow instructions
Safety training: Teaching it to refuse harmful requests
Preference learning: Human reviewers rate responses, and the model learns what humans prefer

Phase 3: Reinforcement Learning from Human Feedback (RLHFWhat is rlhf?Reinforcement Learning from Human Feedback - a training technique where humans rank AI responses and the model learns to generate outputs humans prefer.)

Humans compare different AI responses and rank them. The model learns to generate responses that humans prefer:

More helpful answers
More accurate information
Better formatting and structure
Appropriate tone and style

The key insight: prediction, not understanding

Here's the crucial mental model: LLMs don't "understand" text the way you do.

When you read "The cat sat on the mat," you visualize a cat, understand what "sat" means, and can answer questions about it because you have a mental model of the world.

An LLM processes "The cat sat on the mat" by:

Breaking it into tokens
Running mathematical operations on those tokens
Generating a probability distribution for what comes next

It has no mental image of a cat. It has statistical patterns learned from seeing the word "cat" in millions of contexts.

This means:

It can generate plausible-sounding nonsense
It lacks true reasoning (though it mimics it well)
It can't verify facts against reality, only predict what sounds true

Why they seem so smart

If LLMs are just predicting next words, why do they seem so intelligent?

Pattern matching at scale

The training data contains countless examples of:

Problem-solving approaches
Logical arguments
Explanations of complex topics
Creative writing
Code solutions

When you ask a question, the model recognizes patterns in your prompt and generates text that follows similar patterns from its training.

Emergent abilities

As models get larger, they develop capabilities that weren't explicitly programmed:

Following complex instructions
Translating between languages
Writing code in multiple programming languages
Understanding context and nuance

These "emerge" from the sheer scale of training, they're not specifically taught.

Capability	Small Model	Large Model
Basic grammar	Yes	Yes
Simple Q&A	Yes	Yes
Complex reasoning	Limited	Yes
Code generation	Poor	Excellent
Multi-step instructions	Limited	Yes
Creative writing	Basic	Advanced

What LLMs actually do: tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. by token

Let's see the process in action:

Input: "The capital of France is"

Step 1: Tokenize the input

["The", " capital", " of", " France", " is"]

Step 2: Model processes tokens

Runs them through billions of calculations
Activates patterns learned during training
Generates probabilities for next token

Step 3: Generate next token

"Paris": 99.2% probability
"Lyon": 0.3% probability
"a": 0.2% probability
...

Step 4: Select token (usually highest probability)

Output: "The capital of France is Paris"

This process repeats for every single word generated.

Key limitations to remember

No memory between sessions

Unless specifically configured, each conversation starts fresh. The model doesn't remember you from yesterday.

Knowledge cutoff

Models have a training date beyond which they don't know what happened. For example, a model trained on data up to early 2025 won't know about events from late 2025. Always check the model's documentation for its cutoff date.

No internet access (usually)

Most LLMs can't browse the web in real-time. They only know what was in their training data.

Confident but wrong

LLMs can generate completely false information with complete confidence. This is called "hallucinationWhat is hallucination?When an AI generates confident but false information - fabricated facts, invented citations, or non-existent code methods.."

LLMs are prediction engines, not oracles. They're incredibly useful tools that can help with writing, coding, analysis, and learning, but they require human judgment to use effectively. Understanding how they work helps you know when to trust them and when to verify.

Done

Complete & Next