TopDev
ky-thuat Beginner

What is a Context Window?

The maximum amount of text an LLM can 'remember' in a single pass. It decides how much information you can fit into a prompt.

Updated: May 5, 2026 · 2 min read

The context window is the maximum number of tokens an LLM can process in a single call — including both your prompt and the model’s response.

Comparing models (2026)

ModelContext WindowRoughly equal to
GPT-3.516k~30 A4 pages
GPT-4o128k~250 pages
Claude 4.7 Sonnet200k - 1M400 - 2000 pages
Gemini 2.5 Pro2M~4000 pages (an entire thick book)
Llama 3.3128k~250 pages

Why does context window matter?

Upsides of a large context

  • Drop a whole document into the prompt without setting up a complex RAG pipeline
  • Long conversations (ChatGPT remembering chats from a month ago)
  • Analyze an entire codebase or full book in one shot

Downsides

  • Expensive — pricing is per token. A whole book = a big bill
  • Slow — the longer the context, the longer the model takes to respond
  • Diluted — the model can miss information buried in the middle of a long context (the “lost in the middle” effect)
  • Reliability gets harder — RAG is still better when documents are huge

Practical rules

SituationApproach
< 50 pages of docsDrop straight into the prompt
50 - 500 pagesConsider a large context (Claude 1M, Gemini 2M)
> 500 pagesUse RAG, don’t brute-force it
Long conversationsUse prompt caching to save money

Tips for using context windows well

  • Put the IMPORTANT question/instruction at the BEGINNING and the END — avoid getting buried in the middle
  • Structure the prompt clearly with XML tags (Claude) or markdown headings
  • Use prompt caching if you reuse the same context many times (saves up to 90%)
  • Token
  • RAG — for when context isn’t enough
Tags
#context#llm#token