Few-shot learning
Definition
Few-shot learning is the ability of a model to learn a new task from just a handful of examples, typically two to ten, provided directly in the prompt rather than through retraining. By showing the model a few input-output pairs, you steer it toward the desired format and behavior. It is a core technique in prompt engineering with modern language models.
How it works
In few-shot learning, you include several worked examples in the prompt before the actual request. Each example shows an input and the desired output, demonstrating the pattern you want the model to follow. The model uses these examples as a guide and applies the same pattern to your new input.
This works through what is called in-context learning. The model is not retrained or updated; instead, it conditions on the examples in the prompt at inference time, recognizing the task from the demonstrations. Because no weights change, the effect lasts only for that single request.
The number and quality of examples matter. A few well-chosen, representative examples often improve accuracy and consistency substantially, especially for tasks with a specific output format or edge cases that a plain instruction would not convey.
Why it matters
Few-shot learning lets teams adapt a general model to a specific task instantly, without the data, time, or cost of fine-tuning. For many production use cases, a handful of examples in the prompt is enough to get reliable, well-formatted results.
It also makes models far more flexible. The same model can be pointed at many different tasks just by changing the examples, which is central to how practitioners build features quickly. When examples alone are not enough, few-shot prompting is often the step before deciding whether full fine-tuning is worthwhile.
Frequently asked questions
How is few-shot learning different from fine-tuning?
Few-shot learning provides examples in the prompt at inference time and changes no model weights, so its effect is temporary and per-request. Fine-tuning updates the model's weights on a dataset, producing a lasting change. Few-shot is faster and cheaper; fine-tuning is more durable.
How many examples does few-shot learning need?
Typically two to ten examples, though the right number depends on the task. More examples can help complex or format-sensitive tasks, but they consume context window and cost, so practitioners balance coverage against prompt length.
What is in-context learning?
In-context learning is the underlying capability that makes few-shot prompting work. The model recognizes and performs a task from examples placed in its context at inference time, without any change to its parameters.
When should I use few-shot instead of zero-shot?
Use few-shot when a task has a specific output format, subtle rules, or edge cases that a plain instruction cannot fully convey. If a clear instruction alone produces good results, zero-shot is simpler and uses fewer tokens.
Zero-shot learning
Zero-shot learning is when a model performs a task it was never explicitly trained or given examples for, relying on its general knowledge and reasoning to handle a novel request. You simply describe the task in the prompt, and the model attempts it without any demonstrations. It reflects the broad, transferable capability of modern large language models.
Prompt engineering
Prompt engineering is the practice of designing and refining the inputs given to an AI model to produce precise, high-quality, and reliable outputs. It covers wording, structure, examples, context, and constraints — shaping how a model interprets a request without changing the model itself. Effective prompting is often the cheapest and fastest way to improve results.
Large language model (LLM)
A large language model is an AI system trained on vast amounts of text to understand and generate human language. Built on transformer architecture and containing billions of parameters, LLMs predict the next token in a sequence, enabling them to answer questions, write, summarize, and reason. They power modern chat assistants, AI search, and autonomous agents.
Context window
A context window is the maximum amount of text, measured in tokens, that a language model can consider in a single interaction — including the prompt, retrieved documents, conversation history, and the model's own output. Frontier models in early 2026 reach context windows of roughly a million tokens, enabling long documents and rich grounding.
AI fine-tuning
AI fine-tuning is the process of taking a pre-trained model and training it further on a smaller, specialized dataset so it adapts to a specific task, domain, tone, or format. It adjusts the model's existing weights rather than training from scratch, producing outputs that better match a brand's requirements or a narrow use case at lower cost than full training.
AI inference
AI inference is the runtime step where a trained AI model takes a prompt and produces an output — the tokens you see streaming back from ChatGPT, Claude, Gemini, or Perplexity. Inference is what costs money in production: every prompt and every generated token consumes GPU time, and the economics of any AI product live in this loop.