All LLM Reviews

11 models reviewed and rated. Sorted by overall score. Updated monthly.

Google's frontier model and the best value at the top tier. At $2/$12 per 1M tokens via the API, Gemini 3 Pro undercuts both Claude and GPT-5.2 while matching them on most benchmarks. The 1M token context window and Google Workspace integration are hard to beat.

Free1.0M contextMultimodal

8.8/10

GPT-5.2OpenAITop Pick

OpenAI's current flagship. GPT-5.2 significantly outpaces GPT-4o — it has a 400K token context window, a hallucination rate down to 6.2%, and perfect scores on the AIME 2025 math benchmark. The model most people using ChatGPT are now running on.

Free400K contextMultimodal

8.3/10

Claude Sonnet 4.6Anthropic

Anthropic's mid-tier model and the practical daily-driver recommendation. Sonnet 4.6 sits just below Opus in raw intelligence but costs 80% less. It's the best model for writing, analysis, and long-document work for anyone who isn't running enterprise-scale inference.

Free200K contextMultimodal

8.0/10

Grok 4.1xAI

xAI's Grok 4.1 has two things nobody else offers: real-time access to X (Twitter) data and a 2 million token context window. Access comes bundled with X Premium — so if you're already paying for X, Grok is effectively included.

Free2.0M contextMultimodal

8.0/10

Llama 4 ScoutMetaOpen Source

Meta's open-source flagship has a 10 million token context window — by a wide margin the largest of any model available. The weights are free to download under Meta's Llama 4 license, but running it costs compute. Via Groq it's among the cheapest options at ~$0.11/1M tokens.

$0.11/$0.11 per 1M tokens10.0M contextOpen sourceMultimodal

7.5/10

Claude Opus 4.6AnthropicTop Pick

Anthropic's most powerful model and the top-ranked non-reasoning LLM on the Artificial Analysis Intelligence Index as of February 2026 (AA Index 46). Opus 4.6 is the model you reach for when quality matters more than cost: complex multi-step analysis, high-stakes creative work, and agentic workflows where a small output quality difference has real downstream consequences. The price — $5/$25 per 1M tokens — reflects that positioning. Unrestricted consumer access requires the Claude Max plan ($100/month).

$5/$25 per 1M tokens200K contextMultimodal

7.5/10

GPT-5 miniOpenAIBest Value

OpenAI's budget reasoning model and one of the most interesting value plays in the current field. GPT-5 mini runs in medium-effort reasoning mode by default and scores 39 on the Artificial Analysis Intelligence Index — higher than several premium-priced non-reasoning models — at $0.25/$2.00 per 1M tokens. That combination makes it smarter per dollar than most alternatives in its price tier. The 400K context window and multimodal input support round out a genuinely capable package for developers who need better-than-baseline quality without flagship pricing.

Free400K contextMultimodal

7.3/10

Gemini 3 FlashGoogleFastest

Google's speed-optimized model that closes surprising ground on intelligence. Released December 2025, Gemini 3 Flash scores 35 on the Artificial Analysis Intelligence Index — higher than several models that cost five to ten times more per token — while running at 170 tokens per second. At $0.50/$3.00 per 1M, it's genuinely cheap for high-volume API use. The 1M token context window and native video/audio/image input make it the practical go-to for multimodal pipelines that need throughput without paying Gemini 3 Pro prices.

Free1.0M contextMultimodal

7.3/10

DeepSeek V3.2DeepSeek

DeepSeek's latest model continues to shock with its price-to-performance ratio. V3.2 introduces 'Fine-Grained Sparse Attention' for 50% better compute efficiency. Input costs drop to $0.07/1M tokens with cache hits. The web interface at chat.deepseek.com appears to be free with no hard usage cap.

Free128K contextOpen source

6.3/10

Mistral Large 3Mistral

Mistral's December 2025 flagship and the most commercially permissive large model in this comparison. Mistral Large 3 is released under Apache 2.0 — genuinely open for commercial use without royalties or usage restrictions. At 675B total parameters with 41B active per token (mixture-of-experts), it scores 23 on the Artificial Analysis Intelligence Index at $0.50/$1.50 per 1M tokens. For enterprise teams that need open-weight licensing terms, the math is straightforward: comparable capability to other open-weight models, completely unrestricted commercial use, and a 256K context window that covers most document workflows.

Free256K contextOpen sourceMultimodal

4.6/10

Llama 4 MaverickMetaOpen Source

Meta's midweight open-source model in the Llama 4 family — larger than Scout (402B total parameters, 17B active via mixture-of-experts) with a 1M token context window and notably fast inference at 124.6 t/s. Artificial Analysis Intelligence Index scores it at 18, below frontier models, but Maverick is not designed to compete on raw reasoning. It exists for workloads where open weights + massive context + low API cost matters more than cutting-edge benchmark performance. At $0.44/1M blended via Together AI, it's one of the cheapest options for large-context production API use.

$0.27/$0.85 per 1M tokens1.0M contextOpen sourceMultimodal

4.4/10

Ratings based on real-world task testing and independent benchmark data. Pricing verified from provider docs — check official sites for current rates.