ky-thuat Beginner

What is a Context Window?

The maximum amount of text an LLM can 'remember' in a single pass. It decides how much information you can fit into a prompt.

Updated: May 5, 2026 · 2 min read

The context window is the maximum number of tokens an LLM can process in a single call — including both your prompt and the model’s response.

Comparing models (2026)

Expensive — pricing is per token. A whole book = a big bill
Slow — the longer the context, the longer the model takes to respond
Diluted — the model can miss information buried in the middle of a long context (the “lost in the middle” effect)
Reliability gets harder — RAG is still better when documents are huge

Situation	Approach
< 50 pages of docs	Drop straight into the prompt
50 - 500 pages	Consider a large context (Claude 1M, Gemini 2M)
> 500 pages	Use RAG, don’t brute-force it
Long conversations	Use prompt caching to save money

Put the IMPORTANT question/instruction at the BEGINNING and the END — avoid getting buried in the middle
Structure the prompt clearly with XML tags (Claude) or markdown headings
Use prompt caching if you reuse the same context many times (saves up to 90%)