TopDev
mo-hinh Intermediate

What is a Foundation Model?

A large AI model trained on broad, general-purpose data — used as a base to fine-tune for many different tasks.

Updated: May 5, 2026 · 2 min read

A Foundation Model is a large AI model trained on BROAD and DIVERSE data, serving as a “foundation” to fine-tune or adapt for many different use cases — instead of training a separate model from scratch for each task.

The term was coined by Stanford CRFM in 2021. Every major LLM (GPT, Claude, Gemini, Llama) is a foundation model.

Before vs after foundation models

Before (the old way): each task → train a separate small model

  • Spam detection: train model A
  • Sentiment analysis: train model B
  • Translation: train model C
  • → Costs data, compute, and expertise per task

After (foundation model): train one massive model on general text → adapt for any task

  • Spam? Prompt: “Is this spam? Answer yes/no”
  • Sentiment? Prompt: “What’s the sentiment of this sentence?”
  • Translation? Prompt: “Translate to Vietnamese”
  • → One model handles thousands of tasks

Common characteristics

  1. Scale: billions to trillions of parameters
  2. Diverse data: web, books, code, images, video
  3. Self-supervised pre-training — no labeled data required
  4. Emergent abilities: at sufficient scale, capabilities appear that weren’t trained for directly (e.g., math, reasoning)
  5. Transferable: handles unseen tasks via prompting

Notable foundation models (2026)

Text (LLM)

  • GPT-5 / GPT-5 Pro (OpenAI)
  • Claude 4.7 Sonnet / Opus 4.5 (Anthropic)
  • Gemini 2.5 Pro / 3 Ultra (Google)
  • Llama 4 (Meta — open source)
  • Qwen 3 (Alibaba — open source)
  • DeepSeek V4 (China — open source)

Multimodal

  • GPT-5o (text + image + audio + video)
  • Gemini 2.5 (native multimodal)
  • Claude 4.7 (text + image)

Image

  • DALL-E 4, Imagen 4, Midjourney v7, FLUX.1

Video

  • Sora, Veo 3, Kling 2

Audio / Speech

  • Whisper (OpenAI), GPT-4o voice, ElevenLabs models

Code

  • Codex (legacy), Claude Code-tuned, DeepSeek-Coder

Closed vs open foundation models

Closed (GPT, Claude, Gemini)Open (Llama, Qwen, DeepSeek)
AccessAPI onlyDownload and self-host
StrongestFirst placeSecond place (6-12 months behind)
PrivacyData flows through providerYou stay in control
Deep customizationLimitedFull control
CostPer tokenHardware + ops

→ Consumer products typically use closed models (the strongest). Privacy-sensitive enterprises with large scale → consider open models.

What this means for users

You rarely need to “build a foundation model.” Practical ways to use one:

  1. Prompting: cheapest and fastest
  2. RAG: inject your own knowledge into the prompt
  3. Fine-tuning an existing foundation model (e.g., fine-tune Llama for your domain)
  4. Train from scratch: only if you’re a major lab
Tags
#foundation-model#llm