Indexly
AI & LLMsUpdated May 6, 2026

Machine learning

Definition

Machine learning is the subset of AI in which systems learn patterns from data to make predictions or decisions, rather than following explicitly programmed rules. By training on examples, models improve at tasks like ranking, classification, recommendation, and language understanding. It is the foundation beneath modern AI, including the large language models that power AI search.

How it works

Instead of hand-coding rules, machine learning fits a model to data. During training, an algorithm adjusts the model's internal parameters to minimize the gap between its predictions and the correct answers in the training set. Once trained, the model generalizes to new, unseen inputs.

There are three broad approaches. Supervised learning trains on labeled examples to predict outcomes. Unsupervised learning finds structure in unlabeled data, such as clusters or embeddings. Reinforcement learning trains an agent through feedback from its actions, and is central to aligning modern language models.

Deep learning, which uses many-layered neural networks, is the branch of machine learning behind today's frontier models. Large language models are deep learning systems trained on vast text using these same principles at enormous scale.

Why it matters

Machine learning quietly powers most of the digital systems people use daily, including search ranking, content understanding, recommendations, fraud detection, and spam filtering. Its ability to learn patterns that are hard to specify by hand is what makes these systems work at scale.

It is also the foundation of the current AI wave. Generative AI and large language models are advanced machine learning applications, so understanding the basics, how models learn from data and where they fail, clarifies both the capabilities and the limits of the AI shaping search and discovery today.

Frequently asked questions

What is the difference between AI and machine learning?

Artificial intelligence is the broad goal of building systems that perform tasks requiring intelligence. Machine learning is a subset of AI focused on systems that learn from data rather than following fixed rules. Most modern AI is built on machine learning.

What are the main types of machine learning?

The three primary types are supervised learning, which uses labeled data, unsupervised learning, which finds patterns in unlabeled data, and reinforcement learning, which learns from feedback on actions. Many real systems combine elements of more than one.

How does machine learning relate to large language models?

Large language models are a product of machine learning, specifically deep learning, trained on huge text datasets. The same core idea of learning patterns from data applies, scaled up with neural networks and specialized training techniques.

Does machine learning require huge amounts of data?

It depends on the task and method. Frontier models train on massive datasets, but many practical machine learning applications work well with modest, well-curated data. Data quality often matters as much as quantity.

Large language model (LLM)

A large language model is an AI system trained on vast amounts of text to understand and generate human language. Built on transformer architecture and containing billions of parameters, LLMs predict the next token in a sequence, enabling them to answer questions, write, summarize, and reason. They power modern chat assistants, AI search, and autonomous agents.

Foundation models

Foundation models are large-scale AI models trained on broad, diverse data that serve as a general-purpose base adapted for many downstream applications. Rather than building a model per task, organizations fine-tune or prompt a single foundation model for translation, summarization, coding, search, and more. Large language models and multimodal models are common examples.

Natural language processing (NLP)

Natural language processing is the AI discipline that enables computers to understand, interpret, and generate human language. It spans tasks such as translation, summarization, sentiment analysis, entity recognition, and question answering. Once driven by hand-built rules and statistical models, NLP is now dominated by large language models built on the transformer architecture.

RLHF (reinforcement learning from human feedback)

RLHF (reinforcement learning from human feedback) is a training method that aligns a language model with human preferences. Human evaluators rank model outputs, those rankings train a reward model, and the language model is then optimized to produce responses the reward model scores highly. RLHF is a key reason modern chat models feel helpful, follow instructions, and avoid many unsafe outputs.

Embeddings

Embeddings are numerical vector representations of text, images, or other data that capture semantic meaning. By mapping content into a high- dimensional space where similar items sit close together, embeddings let AI systems compare meaning mathematically — powering similarity search, retrieval, clustering, and recommendation.

AI training data

AI training data is the corpus of text, code, images, and other content used to train large language models. Frontier models like GPT-4o, Claude 4 Sonnet, Gemini 2.5, and Llama 4 are trained on trillions of tokens drawn from web crawls, books, code repositories, and licensed datasets — the composition of which shapes what the model knows, who it cites, and how it represents brands.