AI & LLMsUpdated May 6, 2026

Chain of thought (CoT)

Definition

Chain of thought is a prompting technique that improves a model's reasoning by encouraging it to work through a problem step by step before giving a final answer. Making intermediate reasoning explicit helps models handle multi-step math, logic, and planning tasks more reliably. Once a hand-written prompting trick, chain-of-thought reasoning is now built directly into reasoning models that think before they respond.

How it works

Chain-of-thought prompting asks a model to spell out its intermediate steps rather than jumping straight to an answer. A simple instruction like "think step by step" or a few worked examples can prompt the model to generate a reasoning trace before concluding.

The technique works because generating intermediate steps gives the model more room to compute. Each step conditions the next, so the model can break a hard problem into smaller, more tractable pieces — much like showing your work on a math problem.

Variants extend the core idea:

Zero-shot CoT triggers reasoning with a short instruction and no examples.
Few-shot CoT supplies example problems with worked solutions.
Self-consistency samples several reasoning paths and takes the majority answer.

In current reasoning models, chain-of-thought is no longer just a prompt. The model is trained to produce extended internal reasoning automatically, often hidden from the user, and to spend more computation thinking before it answers.

Why it matters

Chain of thought was a key discovery in making language models reliable on complex tasks. Problems that models failed when forced to answer immediately became solvable when allowed to reason in steps. This unlocked stronger performance on math, coding, logic, and planning.

It also improves transparency. A visible reasoning trace lets users and developers see how a model reached a conclusion, making errors easier to spot — though reasoning traces do not always faithfully reflect the model's true internal process.

For AI search and grounding, step-by-step reasoning helps a system decide what to retrieve, how to weigh sources, and how to synthesize them into an accurate answer — reducing the leaps that lead to hallucinated or poorly supported claims.

Frequently asked questions

What is chain-of-thought prompting?

It is a technique where you prompt a model to reason step by step before answering, often with an instruction like "think step by step" or with worked examples. Generating intermediate steps improves accuracy on multi-step math, logic, and planning problems.

How is chain of thought related to reasoning models?

Reasoning models bake chain-of-thought reasoning into the model itself. Instead of relying on a prompt, they are trained to produce extended internal reasoning automatically and to spend extra computation thinking before responding.

Does chain of thought always improve accuracy?

It usually helps on complex, multi-step tasks but adds little to simple lookups and increases latency and token cost. Reasoning traces can also look convincing while being wrong, so output should still be verified for high-stakes use.

Is the model's reasoning trace its real thinking?

Not necessarily. A chain-of-thought trace is a plausible explanation generated alongside the answer, but it does not always faithfully reflect the model's underlying computation. Treat it as a useful aid, not a guaranteed record of how the answer was derived.

Reasoning models

Reasoning models are language models trained to solve complex problems by thinking step by step before answering, spending extra computation at inference to work through a problem rather than responding immediately. Examples include OpenAI's o-series, DeepSeek-R1, and reasoning-tier Gemini and Claude modes. The approach trades latency and cost for stronger performance on math, coding, science, and multi-step planning.

Test-time compute

Test-time compute is the practice of allocating extra computation during inference — when a model is answering — so it can effectively think longer before responding. Instead of relying only on a model's size, systems spend more compute per query through longer reasoning, multiple sampled attempts, or search over candidate answers. This improves reasoning quality on hard problems and underpins modern reasoning models.

Prompt engineering

Prompt engineering is the practice of designing and refining the inputs given to an AI model to produce precise, high-quality, and reliable outputs. It covers wording, structure, examples, context, and constraints — shaping how a model interprets a request without changing the model itself. Effective prompting is often the cheapest and fastest way to improve results.

Few-shot learning

Few-shot learning is the ability of a model to learn a new task from just a handful of examples, typically two to ten, provided directly in the prompt rather than through retraining. By showing the model a few input-output pairs, you steer it toward the desired format and behavior. It is a core technique in prompt engineering with modern language models.

Zero-shot learning

Zero-shot learning is when a model performs a task it was never explicitly trained or given examples for, relying on its general knowledge and reasoning to handle a novel request. You simply describe the task in the prompt, and the model attempts it without any demonstrations. It reflects the broad, transferable capability of modern large language models.

Large language model (LLM)

A large language model is an AI system trained on vast amounts of text to understand and generate human language. Built on transformer architecture and containing billions of parameters, LLMs predict the next token in a sequence, enabling them to answer questions, write, summarize, and reason. They power modern chat assistants, AI search, and autonomous agents.