TopDev
ky-thuat Advanced

What is LoRA?

Low-Rank Adaptation — a technique for fine-tuning large models that saves 100-1000× the resources of full fine-tuning.

Updated: May 5, 2026 · 2 min read

LoRA (Low-Rank Adaptation) is a technique for fine-tuning large models (LLMs, diffusion models) by learning only a small set of new parameters while freezing most of the original model. It cuts RAM needs by 3-10× and disk usage by 100-1000×.

The problem LoRA solves

Fully fine-tuning Llama 70B requires:

  • ~280GB of GPU RAM (for weights alone)
  • A 280GB output file per version
  • Hard to share, hard to deploy multiple versions

→ Too expensive for individuals and startups.

The LoRA idea

Instead of editing the entire weight matrix W, LoRA learns two small matrices A and B such that:

W_new = W + A × B    (A and B are much smaller than W)

A × B is a “delta” added to the original weight. Because A and B are low-rank, the total number of trainable parameters drops by 100-1000×.

At inference time: add the delta to the original weights → you get a fine-tuned model. Want a different version? Load a different delta.

Benefits

FactorFull fine-tuneLoRA
GPU RAM during training280GB+30-80GB
Output file size280GB50-500MB
Train timeWeeksHours
Number of versions keptHardEasy (a few hundred MB each)

Real-world use cases

Personalizing an LLM

Lightly fine-tune Llama on your writing style → 100MB LoRA file used as a “personal assistant.”

Domain specialization

Fine-tune for medicine/law/finance from a Llama base → one LoRA per domain.

Style for Stable Diffusion

Train a LoRA for a specific art style (anime, oil painting, a particular photo style) → 50-200MB files. The community shares thousands of LoRAs on Civitai.

Character LoRA

Train a LoRA on a specific person/character’s face → every generated image features that character.

QLoRA = Quantized LoRA:

  • Quantize the base model to 4-bit
  • Apply LoRA on top
  • → Train Llama 70B on a single 24-48GB GPU!

This is the most popular technique for hobbyists fine-tuning large models.

Limitations

  • Cannot drastically change core behavior — only nudges it
  • May catastrophically forget prior knowledge if the fine-tune data is too skewed
  • Quality is ~5-10% below full fine-tune on benchmarks

→ Good enough for 90% of use cases. Full fine-tune is only needed when pushing SOTA.

Tools

  • HuggingFace PEFT — the main package for LoRA
  • Axolotl — easy-to-use wrapper
  • Unsloth — 2-5× speed optimization
  • Kohya_ss — Stable Diffusion LoRA training UI
Tags
#lora#fine-tuning#optimization