ky-thuat Beginner

What is Chain of Thought (CoT)?

A prompting technique that asks an LLM to 'think step by step' before answering — significantly boosts accuracy on hard problems.

Updated: May 5, 2026 · 2 min read

Chain of Thought (CoT) is a prompting technique that asks an LLM to write out its reasoning steps before reaching a conclusion, instead of jumping straight to the answer. It noticeably improves accuracy on logic, math, and planning problems.

Example

Plain prompt:

Roger has 5 tennis balls. Today he buys 2 cans, each containing 3 balls. How many balls does Roger have?

LLM (no CoT): “11 balls.” → correct, but sometimes wrong because it leaps straight to the answer.

CoT prompt:

Roger has 5 tennis balls. Today he buys 2 cans, each containing 3 balls. How many balls does Roger have? Let’s think step by step.

LLM:

Step 1: Roger starts with 5 balls. Step 2: 2 cans × 3 balls = 6 new balls. Step 3: 5 + 6 = 11 balls. Answer: 11 balls.

By “exposing” its reasoning, the model makes fewer arithmetic and logic mistakes.

Why does CoT work?

The intuition: an LLM predicts tokens. When you ask it to write step-by-step, each new token is conditioned on the previous step → fewer chances to skip ahead and get it wrong.

The original paper (Wei et al., 2022) showed CoT lifting accuracy from ~17% to ~58% on the GSM8K math benchmark with a 540B model.

Variants

Zero-shot CoT

Just appending "Let's think step by step" to the end of the prompt is enough to trigger reasoning.

Few-shot CoT

Provide 2-3 step-by-step examples before asking the real question:

Q: 23 + 47 = ?
A: 23 + 47. 23 = 20+3, 47 = 40+7. 20+40 = 60, 3+7 = 10, 60+10 = 70. Answer: 70.

Q: 89 - 34 = ?
A: ...

Tree of Thoughts (ToT)

Branching reasoning — try multiple paths, evaluate them, pick the best one.

Self-consistency

Generate many different chains of thought (with high temperature) and majority-vote the most common answer.

Reasoning models (2025-26)

The newest models (OpenAI o1/o3, Claude 4.7 Sonnet with extended thinking, Gemini 2.5) do CoT internally before answering, with no prompting needed.

You’ll see “thinking…” for a few seconds to a few minutes before the result appears. More expensive, but far more accurate on hard tasks.

When you DON’T need CoT

Simple questions (definitions, facts)
Open-ended creative work (poetry, fiction)
When you need a fast response and latency matters