Course:AI & Tools Literacy/
Lesson

LLMs aren't just black boxes that output text. They have settings you can adjust to control their behavior. Understanding these settings helps you get better, more reliable results.

TemperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses.: creativity vs predictability

Temperature is the most important setting. It controls how "creative" or "random" the model's responses are.

How it works

Remember: the model generates text by choosing the next tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use. based on probability.

Temperature modifies those probabilities:

  • Low temperature (0-0.3): Always pick the highest probability token
  • High temperature (0.7-1.0+): Sample from a wider range of probable tokens
Next token probabilities for "The cat":

"sat": 60%    "jumped": 25%    "meowed": 10%    "flew": 5%

Temperature = 0.1 (low):
→ Almost always picks "sat"

Temperature = 0.8 (high):
→ Might pick "jumped" or even "meowed"

Temperature = 1.5 (very high):
→ Might pick "flew" (nonsense but creative!)

When to use different temperatures

TemperatureBest ForWhy
0.0, 0.2Coding, math, fact-checkingConsistent, predictable answers
0.3, 0.5Technical writing, analysisBalanced accuracy and variety
0.6, 0.8General conversation, brainstormingNatural, varied responses
0.9, 1.2Creative writing, fictionUnexpected, imaginative outputs
1.3+Artistic experimentationOften incoherent but surprising

Examples

Prompt: "Write a haiku about programming"

Temperature 0.2:

Code runs through the night
Functions call and logic flows
Program executes

Temperature 0.8:

Semicolons dance
Through loops of midnight coffee
Bugs become features

Temperature 1.5:

Quantum bytes sing songs
Neon dragons debug stars
Syntax tastes purple

Technical detail
Temperature is applied as an exponent to the logits (raw scores) before the softmax function converts them to probabilities. Temperature 1 means no change. Temperature 0.5 makes high probabilities even higher. Temperature 2 makes the distribution more uniform.
02

Top-p (Nucleus Sampling)

Top-p is an alternative to temperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses.. Instead of adjusting probabilities, it restricts which tokens can be chosen.

How it works

  • Consider only tokens whose cumulative probability exceeds p
  • Then sample from those tokens
Top-p = 0.9:

Token probabilities:
"sat": 50% (cumulative: 50%)"jumped": 30% (cumulative: 80%)"meowed": 15% (cumulative: 95%)"flew": 5% (cumulative: 100%) ✗

Only consider "sat", "jumped", "meowed" (95% of probability mass)

Top-p vs Temperature

Top-p advantages:

  • More intuitive ("use the top 90% of likely words")
  • Automatically adapts to the confidence of the model
  • Often produces more coherent results

Common combination:

  • Temperature: 0.7
  • Top-p: 0.9

03

Max tokens (max length)

Max tokens limits how long the response can be.

Why use it?

  1. Cost control: You're charged per tokenWhat is token?The smallest unit of text an LLM processes - roughly three-quarters of a word. API pricing is based on how many tokens you use.
  2. Prevent runaway: Some prompts can trigger infinite loops
  3. Force conciseness: "Explain in max 100 tokens"
  4. APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. limits: Some endpoints have maximum response sizes

Examples

Prompt: "Summarize this article"
Max tokens: 150
→ Forces a brief summary

Prompt: "Explain quantum computing"
Max tokens: 2000
→ Allows comprehensive explanation
04

Penalty settings

Two settings discourage certain behaviors:

Frequency penalty

Reduces repetition of the same words/phrases.

  • 0.0: No penalty (can repeat freely)
  • 0.5-1.0: Moderate penalty (reduces repetition)
  • 2.0+: Strong penalty (forces variety)

Use case: Long-form writing where you don't want the model to keep using the same phrases.

Presence penalty

Encourages the model to talk about new topics.

  • 0.0: No penalty (can stay on topic)
  • 0.5-1.0: Encourages topic shifts
  • 2.0+: Forces constant topic changes

Use case: Brainstorming sessions where you want diverse ideas.

Difference between them

PenaltyWhat it punishesEffect
FrequencyRepeating the SAME word multiple times"Very very very" → "Quite extremely remarkably"
PresenceRepeating words that ALREADY appearedAvoids reusing concepts from earlier in the text
05

Recommended settings by task

Code generation

Temperature: 0.1-0.2
Top-p: 0.95
Max tokens: 2000
Frequency penalty: 0
Presence penalty: 0
  • Low temp for consistency
  • High top-p to allow necessary syntax variations

Technical documentation

Temperature: 0.3-0.4
Top-p: 0.9
Max tokens: 1500
Frequency penalty: 0.2
Presence penalty: 0.1
  • Balanced accuracy and readability
  • Slight penalties to avoid repetitive language

Creative writing

Temperature: 0.8-1.0
Top-p: 0.9
Max tokens: 2000
Frequency penalty: 0.5
Presence penalty: 0.3
  • Higher temp for creativity
  • Penalties to maintain varied language

Brainstorming

Temperature: 0.9-1.1
Top-p: 0.95
Max tokens: 1000
Frequency penalty: 0.3
Presence penalty: 0.6
  • High temp and presence penalty for diverse ideas
06

Where to set these

OpenAI APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses.:

response = client.chat.completions.create(
    model="your-model-id",  # check docs for current models
    messages=[{"role": "user", "content": "Explain Python"}],
    temperature=0.3,
    max_tokens=500,
    top_p=0.9,
    frequency_penalty=0.2,
    presence_penalty=0.1
)

Claude API:

response = client.messages.create(
    model="your-model-id",  # e.g. "claude-sonnet-4-6", check docs for current models
    messages=[{"role": "user", "content": "Explain Python"}],
    temperature=0.3,
    max_tokens=500
)

ChatGPT web interface:

  • Click your profile
  • "Customize ChatGPT"
  • Limited options available


These settings give you fine-grained control over LLM behavior. Start with defaults, then experiment. For most tasks, temperatureWhat is temperature?A setting that controls how creative or predictable an AI's output is. Low temperature gives consistent answers; high temperature produces more varied responses. between 0.3-0.7 and top-p of 0.9-0.95 works well. Adjust based on whether you need precision (lower) or creativity (higher).