Indexly
AI & LLMsUpdated May 6, 2026

Foundation models

Definition

Foundation models are large-scale AI models trained on broad, diverse data that serve as a general-purpose base adapted for many downstream applications. Rather than building a model per task, organizations fine-tune or prompt a single foundation model for translation, summarization, coding, search, and more. Large language models and multimodal models are common examples.

How it works

A foundation model is pretrained once on a very large and diverse dataset, learning general representations of language, images, or other modalities. This pretraining is expensive and compute-heavy, but it produces a versatile base that captures broad knowledge and patterns.

Organizations then adapt this base for specific needs through fine-tuning, prompt engineering, retrieval-augmented generation, or tool integration. Because the foundation model already encodes general capability, adaptation requires far less data and compute than training a model from scratch.

The term spans modalities. Large language models are text foundation models, while multimodal foundation models handle images, audio, and video alongside text. The defining trait is generality, the same model underpinning many distinct applications.

Why it matters

Foundation models reshaped how AI is built. Instead of training a bespoke model for every problem, teams now start from a capable base and specialize it, dramatically lowering the cost and time to ship AI features.

This concentration also has consequences. A handful of foundation models power most consumer AI products, so their training data, biases, and retrieval behavior shape what billions of users see. For anyone optimizing for AI search, understanding the foundation model behind an answer engine is increasingly strategic.

Frequently asked questions

Are foundation models the same as large language models?

Not exactly. Large language models are a type of foundation model focused on text. Foundation models is the broader category that also includes multimodal models handling images, audio, and video. Every LLM is a foundation model, but not every foundation model is an LLM.

How are foundation models adapted to specific tasks?

Through fine-tuning on task data, prompt engineering, few-shot examples, retrieval-augmented generation, or connecting tools and functions. Each method specializes the general base for a domain without retraining the model from scratch.

Why are foundation models expensive to build?

Pretraining requires massive datasets and thousands of GPUs running for extended periods. The compute, data curation, and engineering involved put frontier foundation models out of reach for most organizations, which is why adapting existing ones is the norm.

Can foundation models be open source?

Yes. Several foundation models release their weights publicly, letting organizations self-host, customize, and inspect them. Open foundation models trade some frontier capability for control, data privacy, and lower long-term cost.

Large language model (LLM)

A large language model is an AI system trained on vast amounts of text to understand and generate human language. Built on transformer architecture and containing billions of parameters, LLMs predict the next token in a sequence, enabling them to answer questions, write, summarize, and reason. They power modern chat assistants, AI search, and autonomous agents.

Multimodal AI

Multimodal AI refers to models that process and understand multiple types of input, such as text, images, audio, and video, within a single system. Instead of handling one modality at a time, a multimodal model can read a chart, describe a photo, transcribe speech, and reason across them together, enabling richer interactions and search experiences.

Small language models (SLMs)

Small language models are compact AI models, typically ranging from about one to ten billion parameters, designed for on-device deployment, low latency, and cost efficiency while retaining useful capability. By trading some breadth for a smaller footprint, SLMs run on phones, laptops, and edge hardware, enabling private, fast, and inexpensive language tasks.

Open source LLMs

Open source LLMs are large language models whose weights are publicly available for download, allowing anyone to self-host, fine-tune, and inspect them. Families such as Llama, Mistral, Qwen, and DeepSeek give organizations control over deployment, customization, and data privacy, in contrast to closed models accessible only through a provider's API.

AI fine-tuning

AI fine-tuning is the process of taking a pre-trained model and training it further on a smaller, specialized dataset so it adapts to a specific task, domain, tone, or format. It adjusts the model's existing weights rather than training from scratch, producing outputs that better match a brand's requirements or a narrow use case at lower cost than full training.

AI training data

AI training data is the corpus of text, code, images, and other content used to train large language models. Frontier models like GPT-4o, Claude 4 Sonnet, Gemini 2.5, and Llama 4 are trained on trillions of tokens drawn from web crawls, books, code repositories, and licensed datasets — the composition of which shapes what the model knows, who it cites, and how it represents brands.