Temperature Settings: Control AI Creativity

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

LLMs aren't just black boxes that output text. They have settings you can adjust to control their behavior. Understanding these settings helps you get better, more reliable results.

TemperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses.: creativity vs predictability

Temperature is the most important setting. It controls how "creative" or "random" the model's responses are.

How it works

Remember: the model generates text by choosing the next tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. based on probability.

Temperature modifies those probabilities:

Low temperature (0-0.3): Always pick the highest probability token
High temperature (0.7-1.0+): Sample from a wider range of probable tokens

Next token probabilities for "The cat":

"sat": 60%    "jumped": 25%    "meowed": 10%    "flew": 5%

Temperature = 0.1 (low):
→ Almost always picks "sat"

Temperature = 0.8 (high):
→ Might pick "jumped" or even "meowed"

Temperature = 1.5 (very high):
→ Might pick "flew" (nonsense but creative!)

When to use different temperatures

Temperature	Best For	Why
0.0, 0.2	Coding, math, fact-checking	Consistent, predictable answers
0.3, 0.5	Technical writing, analysis	Balanced accuracy and variety
0.6, 0.8	General conversation, brainstorming	Natural, varied responses
0.9, 1.2	Creative writing, fiction	Unexpected, imaginative outputs
1.3+	Artistic experimentation	Often incoherent but surprising

Examples

Prompt: "Write a haiku about programming"

Temperature 0.2:

Code runs through the night
Functions call and logic flows
Program executes

Temperature 0.8:

Semicolons dance
Through loops of midnight coffee
Bugs become features

Temperature 1.5:

Quantum bytes sing songs
Neon dragons debug stars
Syntax tastes purple

Technical detail

Temperature is applied as an exponent to the logits (raw scores) before the softmax function converts them to probabilities. Temperature 1 means no change. Temperature 0.5 makes high probabilities even higher. Temperature 2 makes the distribution more uniform.

Top-p (Nucleus Sampling)

Top-p is an alternative to temperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses.. Instead of adjusting probabilities, it restricts which tokens can be chosen.

How it works

Consider only tokens whose cumulative probability exceeds p
Then sample from those tokens

Top-p = 0.9:

Token probabilities:
"sat": 50% (cumulative: 50%) ✓
"jumped": 30% (cumulative: 80%) ✓
"meowed": 15% (cumulative: 95%) ✓
"flew": 5% (cumulative: 100%) ✗

Only consider "sat", "jumped", "meowed" (95% of probability mass)

Top-p vs Temperature

Top-p advantages:

More intuitive ("use the top 90% of likely words")
Automatically adapts to the confidence of the model
Often produces more coherent results

Common combination:

Temperature: 0.7
Top-p: 0.9

Max tokens (max length)

Max tokens limits how long the response can be.

Why use it?

Cost control: You're charged per tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use.
Prevent runaway: Some prompts can trigger infinite loops
Force conciseness: "Explain in max 100 tokens"
APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. limits: Some endpoints have maximum response sizes

Examples

Prompt: "Summarize this article"
Max tokens: 150
→ Forces a brief summary

Prompt: "Explain quantum computing"
Max tokens: 2000
→ Allows comprehensive explanation

Penalty settings

Two settings discourage certain behaviors:

Frequency penalty

Reduces repetition of the same words/phrases.

0.0: No penalty (can repeat freely)
0.5-1.0: Moderate penalty (reduces repetition)
2.0+: Strong penalty (forces variety)

Use case: Long-form writing where you don't want the model to keep using the same phrases.

Presence penalty

Encourages the model to talk about new topics.

0.0: No penalty (can stay on topic)
0.5-1.0: Encourages topic shifts
2.0+: Forces constant topic changes

Use case: Brainstorming sessions where you want diverse ideas.

Difference between them

Penalty	What it punishes	Effect
Frequency	Repeating the SAME word multiple times	"Very very very" → "Quite extremely remarkably"
Presence	Repeating words that ALREADY appeared	Avoids reusing concepts from earlier in the text

Recommended settings by task

Code generation

Temperature: 0.1-0.2
Top-p: 0.95
Max tokens: 2000
Frequency penalty: 0
Presence penalty: 0

Low temp for consistency
High top-p to allow necessary syntax variations

Technical documentation

Temperature: 0.3-0.4
Top-p: 0.9
Max tokens: 1500
Frequency penalty: 0.2
Presence penalty: 0.1

Balanced accuracy and readability
Slight penalties to avoid repetitive language

Creative writing

Temperature: 0.8-1.0
Top-p: 0.9
Max tokens: 2000
Frequency penalty: 0.5
Presence penalty: 0.3

Higher temp for creativity
Penalties to maintain varied language

Brainstorming

Temperature: 0.9-1.1
Top-p: 0.95
Max tokens: 1000
Frequency penalty: 0.3
Presence penalty: 0.6

High temp and presence penalty for diverse ideas

Where to set these

OpenAI APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses.:

response = client.chat.completions.create(
    model="your-model-id",  # check docs for current models
    messages=[{"role": "user", "content": "Explain Python"}],
    temperature=0.3,
    max_tokens=500,
    top_p=0.9,
    frequency_penalty=0.2,
    presence_penalty=0.1
)

Claude API:

response = client.messages.create(
    model="your-model-id",  # e.g. "claude-sonnet-4-6", check docs for current models
    messages=[{"role": "user", "content": "Explain Python"}],
    temperature=0.3,
    max_tokens=500
)

ChatGPT web interface:

Click your profile
"Customize ChatGPT"
Limited options available

These settings give you fine-grained control over LLM behavior. Start with defaults, then experiment. For most tasks, temperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses. between 0.3-0.7 and top-p of 0.9-0.95 works well. Adjust based on whether you need precision (lower) or creativity (higher).

Create a free account to save your progress

Essential to know

TemperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses. Ask AI for more: creativity vs predictability

How it works

When to use different temperatures

Examples

Top-p (Nucleus Sampling)

How it works

Top-p vs Temperature

Max tokens (max length)

Why use it?

Examples

Penalty settings

Frequency penalty

Presence penalty

Difference between them

Recommended settings by task

Code generation

Technical documentation

Creative writing

Brainstorming

Where to set these

TemperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses.: creativity vs predictability