What is a Foundation Model?
A large AI model trained on broad, general-purpose data — used as a base to fine-tune for many different tasks.
A Foundation Model is a large AI model trained on BROAD and DIVERSE data, serving as a “foundation” to fine-tune or adapt for many different use cases — instead of training a separate model from scratch for each task.
The term was coined by Stanford CRFM in 2021. Every major LLM (GPT, Claude, Gemini, Llama) is a foundation model.
Before vs after foundation models
Before (the old way): each task → train a separate small model
- Spam detection: train model A
- Sentiment analysis: train model B
- Translation: train model C
- → Costs data, compute, and expertise per task
After (foundation model): train one massive model on general text → adapt for any task
- Spam? Prompt: “Is this spam? Answer yes/no”
- Sentiment? Prompt: “What’s the sentiment of this sentence?”
- Translation? Prompt: “Translate to Vietnamese”
- → One model handles thousands of tasks
Common characteristics
- Scale: billions to trillions of parameters
- Diverse data: web, books, code, images, video
- Self-supervised pre-training — no labeled data required
- Emergent abilities: at sufficient scale, capabilities appear that weren’t trained for directly (e.g., math, reasoning)
- Transferable: handles unseen tasks via prompting
Notable foundation models (2026)
Text (LLM)
- GPT-5 / GPT-5 Pro (OpenAI)
- Claude 4.7 Sonnet / Opus 4.5 (Anthropic)
- Gemini 2.5 Pro / 3 Ultra (Google)
- Llama 4 (Meta — open source)
- Qwen 3 (Alibaba — open source)
- DeepSeek V4 (China — open source)
Multimodal
- GPT-5o (text + image + audio + video)
- Gemini 2.5 (native multimodal)
- Claude 4.7 (text + image)
Image
- DALL-E 4, Imagen 4, Midjourney v7, FLUX.1
Video
- Sora, Veo 3, Kling 2
Audio / Speech
- Whisper (OpenAI), GPT-4o voice, ElevenLabs models
Code
- Codex (legacy), Claude Code-tuned, DeepSeek-Coder
Closed vs open foundation models
| Closed (GPT, Claude, Gemini) | Open (Llama, Qwen, DeepSeek) | |
|---|---|---|
| Access | API only | Download and self-host |
| Strongest | First place | Second place (6-12 months behind) |
| Privacy | Data flows through provider | You stay in control |
| Deep customization | Limited | Full control |
| Cost | Per token | Hardware + ops |
→ Consumer products typically use closed models (the strongest). Privacy-sensitive enterprises with large scale → consider open models.
What this means for users
You rarely need to “build a foundation model.” Practical ways to use one:
- Prompting: cheapest and fastest
- RAG: inject your own knowledge into the prompt
- Fine-tuning an existing foundation model (e.g., fine-tune Llama for your domain)
- Train from scratch: only if you’re a major lab