What is LoRA?
Low-Rank Adaptation — a technique for fine-tuning large models that saves 100-1000× the resources of full fine-tuning.
LoRA (Low-Rank Adaptation) is a technique for fine-tuning large models (LLMs, diffusion models) by learning only a small set of new parameters while freezing most of the original model. It cuts RAM needs by 3-10× and disk usage by 100-1000×.
The problem LoRA solves
Fully fine-tuning Llama 70B requires:
- ~280GB of GPU RAM (for weights alone)
- A 280GB output file per version
- Hard to share, hard to deploy multiple versions
→ Too expensive for individuals and startups.
The LoRA idea
Instead of editing the entire weight matrix W, LoRA learns two small matrices A and B such that:
W_new = W + A × B (A and B are much smaller than W)
A × B is a “delta” added to the original weight. Because A and B are low-rank, the total number of trainable parameters drops by 100-1000×.
At inference time: add the delta to the original weights → you get a fine-tuned model. Want a different version? Load a different delta.
Benefits
| Factor | Full fine-tune | LoRA |
|---|---|---|
| GPU RAM during training | 280GB+ | 30-80GB |
| Output file size | 280GB | 50-500MB |
| Train time | Weeks | Hours |
| Number of versions kept | Hard | Easy (a few hundred MB each) |
Real-world use cases
Personalizing an LLM
Lightly fine-tune Llama on your writing style → 100MB LoRA file used as a “personal assistant.”
Domain specialization
Fine-tune for medicine/law/finance from a Llama base → one LoRA per domain.
Style for Stable Diffusion
Train a LoRA for a specific art style (anime, oil painting, a particular photo style) → 50-200MB files. The community shares thousands of LoRAs on Civitai.
Character LoRA
Train a LoRA on a specific person/character’s face → every generated image features that character.
QLoRA — the popular variant
QLoRA = Quantized LoRA:
- Quantize the base model to 4-bit
- Apply LoRA on top
- → Train Llama 70B on a single 24-48GB GPU!
This is the most popular technique for hobbyists fine-tuning large models.
Limitations
- Cannot drastically change core behavior — only nudges it
- May catastrophically forget prior knowledge if the fine-tune data is too skewed
- Quality is ~5-10% below full fine-tune on benchmarks
→ Good enough for 90% of use cases. Full fine-tune is only needed when pushing SOTA.
Tools
- HuggingFace PEFT — the main package for LoRA
- Axolotl — easy-to-use wrapper
- Unsloth — 2-5× speed optimization
- Kohya_ss — Stable Diffusion LoRA training UI