Rankings

Best Multimodal AI Models

Models that accept both text and images as input, ranked by overall quality score. Useful for document analysis, screenshot debugging, visual Q&A, and mixed-media workflows. Text-only models are excluded. Rankings use the same composite score as overall rankings — price not included.

top multimodal model

Gemini 3.1 Pro

Google · Quality 9.0/10 · AA Index 57

Gemini 3.1 ProGoogletop-pick

Google's reasoning-optimized flagship, released February 19, 2026, and currently the #1 ranked model on the Artificial Analysis Intelligence Index (score: 57 out of 114 models). Gemini 3.1 Pro is a direct upgrade to Gemini 3 Pro — same 1M token context window and same $2/$12 pricing — but with dramatically improved reasoning. Its ARC-AGI-2 abstract reasoning score more than doubled from 31.1% to 77.1%, and it nearly doubled its APEX-Agents agentic task score (18.4% → 33.5%). It leads on scientific knowledge (GPQA Diamond 94.3%), competitive coding (LiveCodeBench Pro Elo 2887), and multi-step agentic search (BrowseComp 85.9%). A dedicated custom-tools API endpoint is available for agentic pipeline use. Currently in preview — generally available soon.

AA Index 571M tokens108.6 t/s

9.0

/10

Free (limited) · $4.50/1M API

Full review →

Gemini 3 ProGooglebest-value

Google's frontier model and the best value at the top tier. At $2/$12 per 1M tokens via the API, Gemini 3 Pro undercuts both Claude and GPT-5.2 while matching them on most benchmarks. The 1M token context window and Google Workspace integration are hard to beat.

AA Index 48.441M tokens55 t/s

7.9

/10

Free consumer product

Full review →

GPT-5.2OpenAItop-pick

OpenAI's current flagship. GPT-5.2 significantly outpaces GPT-4o — it has a 400K token context window, a hallucination rate down to 6.2%, and perfect scores on the AIME 2025 math benchmark. The model most people using ChatGPT are now running on.

AA Index 46.58400K tokens65 t/s

7.4

/10

Free (limited) · $4.81/1M API

Full review →

Grok 4.1xAI

xAI's Grok 4.1 has two things nobody else offers: real-time access to X (Twitter) data and a 2 million token context window. Access comes bundled with X Premium — so if you're already paying for X, Grok is effectively included.

AA Index 41.432M tokens90 t/s

7.4

/10

Free (limited) · $6.00/1M API

Full review →

Claude Sonnet 4.6Anthropic

Anthropic's mid-tier model and the practical daily-driver recommendation. Sonnet 4.6 sits just below Opus in raw intelligence but costs 80% less. It's the best model for writing, analysis, and long-document work for anyone who isn't running enterprise-scale inference.

AA Index 44.33200K tokens85 t/s

7.1

/10

Free (limited) · $6.00/1M API

Full review →

Llama 4 ScoutMetaopen weightsopen-source

Meta's open-source flagship has a 10 million token context window — by a wide margin the largest of any model available. The weights are free to download under Meta's Llama 4 license, but running it costs compute. Via Groq it's among the cheapest options at ~$0.11/1M tokens.

AA Index 38.510M tokens180 t/s

6.9

/10

$0.11/1M API blended

Full review →

Gemini 3 FlashGooglefastest

Google's speed-optimized model that closes surprising ground on intelligence. Released December 2025, Gemini 3 Flash scores 35 on the Artificial Analysis Intelligence Index — higher than several models that cost five to ten times more per token — while running at 170 tokens per second. At $0.50/$3.00 per 1M, it's genuinely cheap for high-volume API use. The 1M token context window and native video/audio/image input make it the practical go-to for multimodal pipelines that need throughput without paying Gemini 3 Pro prices.

AA Index 351M tokens170 t/s

6.8

/10

Free consumer product

Full review →

Claude Opus 4.6Anthropictop-pick

Anthropic's most powerful model and the top-ranked non-reasoning LLM on the Artificial Analysis Intelligence Index as of February 2026 (AA Index 46). Opus 4.6 is the model you reach for when quality matters more than cost: complex multi-step analysis, high-stakes creative work, and agentic workflows where a small output quality difference has real downstream consequences. The price — $5/$25 per 1M tokens — reflects that positioning. Unrestricted consumer access requires the Claude Max plan ($100/month).

AA Index 46200K tokens67 t/s

6.6

/10

$10.00/1M API blended

Full review →

GPT-5 miniOpenAIbest-value

OpenAI's budget reasoning model and one of the most interesting value plays in the current field. GPT-5 mini runs in medium-effort reasoning mode by default and scores 39 on the Artificial Analysis Intelligence Index — higher than several premium-priced non-reasoning models — at $0.25/$2.00 per 1M tokens. That combination makes it smarter per dollar than most alternatives in its price tier. The 400K context window and multimodal input support round out a genuinely capable package for developers who need better-than-baseline quality without flagship pricing.

AA Index 39400K tokens73 t/s

6.5

/10

Free (limited) · $0.69/1M API

Full review →

Claude Haiku 4.5Anthropicbest-value

Anthropic's fastest and most affordable model in the Claude 4 generation, released October 2025. Claude Haiku 4.5 runs at 108.8 tokens/second — fast enough for real-time streaming — at $1/$5 per 1M tokens. Despite the low price, it scores an AA Intelligence Index of 31, placing it #13 of 60 proprietary models. It outperforms Claude Sonnet 4 on computer-use benchmarks (50.7% vs 42.2%) while costing three times less. Supports extended thinking mode (billed at $5/1M for thinking tokens), image input, and the full 200K context window shared across the Claude 4 generation.

AA Index 31200K tokens108.8 t/s

5.6

/10

Free (limited) · $2.00/1M API

Full review →

Mistral Large 3Mistralopen weights

Mistral's December 2025 flagship and the most commercially permissive large model in this comparison. Mistral Large 3 is released under Apache 2.0 — genuinely open for commercial use without royalties or usage restrictions. At 675B total parameters with 41B active per token (mixture-of-experts), it scores 23 on the Artificial Analysis Intelligence Index at $0.50/$1.50 per 1M tokens. For enterprise teams that need open-weight licensing terms, the math is straightforward: comparable capability to other open-weight models, completely unrestricted commercial use, and a 256K context window that covers most document workflows.

AA Index 23256K tokens56 t/s

4.4

/10

Free (limited) · $0.75/1M API

Full review →

Llama 4 MaverickMetaopen weightsopen-source

Meta's midweight open-source model in the Llama 4 family — larger than Scout (402B total parameters, 17B active via mixture-of-experts) with a 1M token context window and notably fast inference at 124.6 t/s. Artificial Analysis Intelligence Index scores it at 18, below frontier models, but Maverick is not designed to compete on raw reasoning. It exists for workloads where open weights + massive context + low API cost matters more than cutting-edge benchmark performance. At $0.44/1M blended via Together AI, it's one of the cheapest options for large-context production API use.

AA Index 181M tokens125 t/s

4.4

/10

$0.44/1M API blended

Full review →

What counts as multimodal?

These models accept at least image + text input. Several also support audio, video frames, or document uploads. “Multimodal” does not mean they generate images — for image generation see Image Generators. Want to browse without rank order? Browse all multimodal models →

Last updated February 2026. Intelligence scores from Artificial Analysis. See how we rate for full methodology.