mo-hinh Intermediate

What is a Foundation Model?

A large AI model trained on broad, general-purpose data — used as a base to fine-tune for many different tasks.

Updated: May 5, 2026 · 2 min read

A Foundation Model is a large AI model trained on BROAD and DIVERSE data, serving as a “foundation” to fine-tune or adapt for many different use cases — instead of training a separate model from scratch for each task.

The term was coined by Stanford CRFM in 2021. Every major LLM (GPT, Claude, Gemini, Llama) is a foundation model.

Before vs after foundation models

Before (the old way): each task → train a separate small model

Spam detection: train model A
Sentiment analysis: train model B
Translation: train model C
→ Costs data, compute, and expertise per task

After (foundation model): train one massive model on general text → adapt for any task

Spam? Prompt: “Is this spam? Answer yes/no”
Sentiment? Prompt: “What’s the sentiment of this sentence?”
Translation? Prompt: “Translate to Vietnamese”
→ One model handles thousands of tasks

Common characteristics

Scale: billions to trillions of parameters
Diverse data: web, books, code, images, video
Self-supervised pre-training — no labeled data required
Emergent abilities: at sufficient scale, capabilities appear that weren’t trained for directly (e.g., math, reasoning)
Transferable: handles unseen tasks via prompting

Notable foundation models (2026)

Text (LLM)

GPT-5 / GPT-5 Pro (OpenAI)
Claude 4.7 Sonnet / Opus 4.5 (Anthropic)
Gemini 2.5 Pro / 3 Ultra (Google)
Llama 4 (Meta — open source)
Qwen 3 (Alibaba — open source)
DeepSeek V4 (China — open source)

Multimodal

GPT-5o (text + image + audio + video)
Gemini 2.5 (native multimodal)
Claude 4.7 (text + image)

Image

DALL-E 4, Imagen 4, Midjourney v7, FLUX.1

Video

Sora, Veo 3, Kling 2

Audio / Speech

Whisper (OpenAI), GPT-4o voice, ElevenLabs models

Code

Codex (legacy), Claude Code-tuned, DeepSeek-Coder

Closed vs open foundation models

	Closed (GPT, Claude, Gemini)	Open (Llama, Qwen, DeepSeek)
Access	API only	Download and self-host
Strongest	First place	Second place (6-12 months behind)
Privacy	Data flows through provider	You stay in control
Deep customization	Limited	Full control
Cost	Per token	Hardware + ops

→ Consumer products typically use closed models (the strongest). Privacy-sensitive enterprises with large scale → consider open models.

What this means for users

You rarely need to “build a foundation model.” Practical ways to use one:

Prompting: cheapest and fastest
RAG: inject your own knowledge into the prompt
Fine-tuning an existing foundation model (e.g., fine-tune Llama for your domain)
Train from scratch: only if you’re a major lab