ky-thuat Advanced

What is LoRA?

Low-Rank Adaptation — a technique for fine-tuning large models that saves 100-1000× the resources of full fine-tuning.

Updated: May 5, 2026 · 2 min read

LoRA (Low-Rank Adaptation) is a technique for fine-tuning large models (LLMs, diffusion models) by learning only a small set of new parameters while freezing most of the original model. It cuts RAM needs by 3-10× and disk usage by 100-1000×.

The problem LoRA solves

Fully fine-tuning Llama 70B requires:

~280GB of GPU RAM (for weights alone)
A 280GB output file per version
Hard to share, hard to deploy multiple versions

→ Too expensive for individuals and startups.

The LoRA idea

Instead of editing the entire weight matrix W, LoRA learns two small matrices A and B such that:

W_new = W + A × B    (A and B are much smaller than W)

A × B is a “delta” added to the original weight. Because A and B are low-rank, the total number of trainable parameters drops by 100-1000×.

At inference time: add the delta to the original weights → you get a fine-tuned model. Want a different version? Load a different delta.

Benefits

Factor	Full fine-tune	LoRA
GPU RAM during training	280GB+	30-80GB
Output file size	280GB	50-500MB
Train time	Weeks	Hours
Number of versions kept	Hard	Easy (a few hundred MB each)

Real-world use cases

Personalizing an LLM

Lightly fine-tune Llama on your writing style → 100MB LoRA file used as a “personal assistant.”

Domain specialization

Fine-tune for medicine/law/finance from a Llama base → one LoRA per domain.

Style for Stable Diffusion

Train a LoRA for a specific art style (anime, oil painting, a particular photo style) → 50-200MB files. The community shares thousands of LoRAs on Civitai.

Character LoRA

Train a LoRA on a specific person/character’s face → every generated image features that character.

QLoRA — the popular variant

QLoRA = Quantized LoRA:

Quantize the base model to 4-bit
Apply LoRA on top
→ Train Llama 70B on a single 24-48GB GPU!

This is the most popular technique for hobbyists fine-tuning large models.

Limitations

Cannot drastically change core behavior — only nudges it
May catastrophically forget prior knowledge if the fine-tune data is too skewed
Quality is ~5-10% below full fine-tune on benchmarks

→ Good enough for 90% of use cases. Full fine-tune is only needed when pushing SOTA.

Tools

HuggingFace PEFT — the main package for LoRA
Axolotl — easy-to-use wrapper
Unsloth — 2-5× speed optimization
Kohya_ss — Stable Diffusion LoRA training UI