What is Training?
The process of teaching an AI model by showing it millions or billions of examples and adjusting its internal parameters.
Training is the process of teaching an AI model to extract patterns from data — you show it millions or billions of examples, and every time it guesses wrong, you nudge its internal parameters slightly, repeating until it gets most things right.
The 3 stages of training an LLM
1. Pre-training
The model reads TRILLIONS of tokens (essentially the entire quality web + books + code).
- Goal: predict the next token
- The most expensive part: a few months × thousands of GPUs = $10-100M
- Result: a model that “knows the language” and “general knowledge” but isn’t useful yet
2. Supervised Fine-Tuning (SFT)
The model is shown high-quality (prompt → sample answer) pairs.
- Goal: teach the model to respond like an “assistant”
- Data: tens to hundreds of thousands of human-written pairs
- Cost: $100k-$1M
3. RLHF (or DPO)
Further refinement using human feedback on which answer is better.
- Goal: teach the model to better match human preferences — helpful, safe, not sycophantic
- See RLHF for details
Parameters
During training, the model learns by adjusting its “weights” — these are the parameters.
- GPT-2: 1.5 billion parameters
- GPT-4: ~1.7 trillion
- Llama 3.3 70B: 70 billion
More parameters → “smarter,” but more memory and compute required.
What training costs
Resources
- GPU/TPU cluster: thousands to tens of thousands of chips
- Data: TB to PB of text
- Electricity: training GPT-4 is estimated at ~50GWh (≈ a year of consumption for 5,000 homes)
- Money: $10M - $1B+ for a frontier model
Time
- Pre-training: 2-6 months
- Fine-tuning: 1-4 weeks
- RLHF: 2-8 weeks
→ This is why only a handful of companies (OpenAI, Anthropic, Google, Meta, xAI) can train frontier models.
Do you need to train your own model?
99% of the time, NO. Reasons:
- Far too expensive
- Requires deep expertise
- Most use cases can be solved with prompting + RAG
- When customization is needed → fine-tune an existing model (see Fine-tuning)
Only train from scratch if:
- You’re a big lab with the budget
- You need a model for a niche industry/language that doesn’t exist yet
- You require absolute ownership of the model (e.g., military)
Related
- Inference — running a trained model
- Fine-tuning
- GPU