AI & LLMsUpdated May 6, 2026

Reasoning models

Definition

Reasoning models are language models trained to solve complex problems by thinking step by step before answering, spending extra computation at inference to work through a problem rather than responding immediately. Examples include OpenAI's o-series, DeepSeek-R1, and reasoning-tier Gemini and Claude modes. The approach trades latency and cost for stronger performance on math, coding, science, and multi-step planning.

How it works

Reasoning models build chain-of-thought reasoning into the model itself. Where a standard model maps a prompt to an answer in one pass, a reasoning model first generates an extended internal reasoning trace — exploring approaches, checking steps, and correcting itself — before committing to a final response.

This is enabled by training methods, often including reinforcement learning, that reward correct reasoning and by allocating more test-time compute so the model can "think longer" on hard problems. The internal reasoning is frequently hidden or summarized for the user, with only the conclusion shown.

The result is markedly better performance on tasks with verifiable, multi-step solutions: competition math, complex coding, scientific reasoning, and planning. The cost is higher latency and token usage, because generating the reasoning consumes additional computation.

Why it matters

Reasoning models pushed AI past the ceiling of single-pass generation. By letting models deliberate, they unlocked reliable performance on problems that previously tripped up even strong models, expanding where AI can be trusted for analytical work.

They also reshaped how the field thinks about scaling. Rather than only making models bigger, providers found that spending more computation at inference time — letting a model think longer — delivers large quality gains on hard tasks. This made test-time compute a central lever.

The trade-off is practical: reasoning models are slower and costlier, so they are best reserved for genuinely hard problems while faster models handle routine work. In agentic and grounded settings, their stronger planning and self-correction also improve how systems decide what to retrieve and how to synthesize sources into accurate answers.

Frequently asked questions

How are reasoning models different from standard LLMs?

Standard models map a prompt to an answer in one pass. Reasoning models first generate extended internal reasoning — exploring, checking, and correcting — before answering, and they spend more inference-time compute doing so. This improves accuracy on complex, multi-step problems at the cost of speed and tokens.

What are examples of reasoning models?

Examples include OpenAI's o-series, DeepSeek-R1, and reasoning or extended-thinking modes offered by Gemini and Claude. Many providers now ship a reasoning tier alongside their faster standard models.

When should I use a reasoning model?

Use them for genuinely hard tasks — competition-level math, complex coding, scientific analysis, and multi-step planning — where accuracy matters more than speed. For simple lookups, classification, or formatting, a faster standard model is usually a better fit.

Why do reasoning models cost more?

They generate a long internal reasoning trace before answering, which consumes extra computation and tokens at inference time. That added test-time compute is what improves accuracy, but it raises both latency and cost per request.

Chain of thought (CoT)

Chain of thought is a prompting technique that improves a model's reasoning by encouraging it to work through a problem step by step before giving a final answer. Making intermediate reasoning explicit helps models handle multi-step math, logic, and planning tasks more reliably. Once a hand-written prompting trick, chain-of-thought reasoning is now built directly into reasoning models that think before they respond.

Test-time compute

Test-time compute is the practice of allocating extra computation during inference — when a model is answering — so it can effectively think longer before responding. Instead of relying only on a model's size, systems spend more compute per query through longer reasoning, multiple sampled attempts, or search over candidate answers. This improves reasoning quality on hard problems and underpins modern reasoning models.

Large language model (LLM)

A large language model is an AI system trained on vast amounts of text to understand and generate human language. Built on transformer architecture and containing billions of parameters, LLMs predict the next token in a sequence, enabling them to answer questions, write, summarize, and reason. They power modern chat assistants, AI search, and autonomous agents.

DeepSeek

DeepSeek is a Chinese AI lab known for the DeepSeek V3 model and the R1 reasoning model. Its models use a mixture-of-experts architecture, are released under permissive MIT licensing, and are positioned as competitive with frontier systems while emphasizing strong performance at lower training and inference cost.

Agentic workflows

Agentic workflows are AI architectures in which a model autonomously plans, calls tools, browses the web, executes code, and completes multi-step tasks with limited human input. Rather than producing a single answer, the system loops — observing results, revising its plan, and acting again — marking the shift from AI chat to AI work that carries out goals on a user's behalf.

AI inference

AI inference is the runtime step where a trained AI model takes a prompt and produces an output — the tokens you see streaming back from ChatGPT, Claude, Gemini, or Perplexity. Inference is what costs money in production: every prompt and every generated token consumes GPU time, and the economics of any AI product live in this loop.