[good]

LLMs with the Largest Context Windows

Context window determines how much text a model can see at once — conversation history, documents, code, and your prompt all count against it. A larger window is not always necessary, but when it is, hitting the limit is painful.

largest context window

Llama 4 Scout

Meta · 10M tokens · by a wide margin

Context window tiers

10M+Entire codebases, book-length documents, massive multi-turn sessions. Nothing practical requires more than this today.
1M–5MVery long documents, large repos, extended research sessions. More than sufficient for almost any real-world task.
400K–1MLong documents and large codebases. Handles most enterprise document processing workloads.
128K–400KStandard frontier range. Handles most tasks — long PDFs, full files, extended conversations.
Under 128KSufficient for most chat and coding tasks. Noticeable limits on very large documents.
1
10M tokens
7.5
quality
Intelligence: AA 38.5 est. · API: $0.11/1M blendedFull review →
2
2M tokens
8.0
quality
Intelligence: AA 41.4 · API: $6.00/1M blendedFull review →
3
1M tokens
8.8
quality
Intelligence: AA 48.4 · API: $4.50/1M blendedFull review →
4
1M tokens
7.3
quality
Intelligence: AA 35.0 · API: $1.13/1M blendedFull review →
5
1M tokens
4.4
quality
Intelligence: AA 18.0 · API: $0.44/1M blendedFull review →
6
GPT-5.2OpenAI
400K tokens
8.3
quality
Intelligence: AA 46.6 · API: $4.81/1M blendedFull review →
7
400K tokens
7.3
quality
Intelligence: AA 39.0 · API: $0.69/1M blendedFull review →
8
256K tokens
4.6
quality
Intelligence: AA 23.0 · API: $0.75/1M blendedFull review →
9
200K tokens
8.0
quality
Intelligence: AA 44.3 · API: $6.00/1M blendedFull review →
10
200K tokens
7.5
quality
Intelligence: AA 46.0 · API: $10.00/1M blendedFull review →
11
128K tokens
6.3
quality
Intelligence: AA 41.6 · API: $0.48/1M blendedFull review →

Things to know about context windows

Performance degrades at the edges. Most models are less reliable at retrieving information buried deep in a very long context. A 200K context window used at 190K capacity is not the same as a 200K window used at 20K.

Long context costs more. You pay per token. A 1M context filled to capacity costs significantly more than a 128K context. Gemini 3 Pro charges 2× for prompts over 200K tokens.

Most tasks don't need more than 128K. Unless you are processing entire books, full codebases, or very long conversation histories, any model on this list will handle your workload. Context window is a tiebreaker — not the primary selection criterion for most users.