Rankings

Best LLMs for Coding

AI coding assistants have gone from novelty to essential for most developers. But which LLM should you actually use for coding? The answer depends on whether you're working in an IDE, via API, or in a chat interface — and how much correctness matters.

top pick for coding

Gemini 3.1 Pro

Google · Quality 9.0/10 · AA Index 57

Gemini 3.1 ProGoogletop-pick

Gemini 3.1 Pro is the new top-ranked model on the Artificial Analysis Intelligence Index (score: 57) and leads on the coding benchmarks that matter most to developers. It scores 80.6% on SWE-Bench Verified (essentially tied with Claude Opus's 80.8%), 68.5% on Terminal-Bench 2.0, and 2887 Elo on LiveCodeBench Pro — far ahead of the competition on competitive and algorithmic coding. The dedicated gemini-3.1-pro-preview-customtools API endpoint is purpose-built for agentic pipelines that call bash, view_file, or search_code tools. At $2/$12 per 1M tokens (same as Gemini 3 Pro), the price-to-coding-capability ratio is unmatched at the frontier tier.

💰 API at $2/$12 per 1M tokens via Google AI Studio. Free developer tier available (rate-limited). Google One AI Premium ($20/month) for Gemini App access.

9.0

/10

SWE-bench: 80.6% · AA 57 · 108.6 t/sFull review →

Claude Opus 4.6Anthropictop-pick

Claude Opus 4.6 is the most reliable coding model for complex, multi-step tasks — large refactors, implementing features across multiple files, debugging subtle logic errors. It tracks context across long conversations better than most models, which matters when a task spans hundreds of lines of code. SWE-Bench: 80.8% — a hair above Gemini 3.1 Pro. Still the best choice when you need a model that knows when to ask for clarification instead of guessing.

💰 API at $5/$25 per 1M tokens. Claude Max plan for consumer access ($100/month).

6.6

/10

SWE-bench: 80.8% · AA 46 · 67 t/sFull review →

GPT-5.2OpenAItop-pick

GPT-5.2 is the most tested coding LLM in the world and has the deepest ecosystem: GitHub Copilot, Cursor, and most IDE integrations are built around it. Strong on all standard coding tasks — generation, debugging, code explanation. The go-to choice if you want the most tools and integrations.

💰 Free tier at chatgpt.com. For IDE use, Cursor ($20/month) or GitHub Copilot ($10/month).

7.4

/10

SWE-bench: 80% · AA 46.58 · 65 t/sFull review →

GPT-5 miniOpenAIbest-value

GPT-5 mini is the best budget coding option. Its reasoning model architecture handles multi-step problems — debugging, algorithm design, SQL — more reliably than non-reasoning models at the same price. At $0.25/$2.00 per 1M tokens, it's the smartest-per-dollar API choice for developers building coding assistants.

💰 Available on ChatGPT free tier with limits. API at $0.25/$2.00 per 1M tokens.

6.5

/10

SWE-bench: 74.9%* · AA 39 · 73 t/sFull review →

Claude Sonnet 4.6Anthropic

Claude Sonnet 4.6's 200K context window holds large codebases in context, and it's exceptional at following detailed code style and architecture instructions. Many developers prefer it for refactoring work and longer coding sessions where context retention matters more than raw speed.

💰 Free tier at claude.ai. API at $3/$15 per 1M tokens.

7.1

/10

SWE-bench: 79.6% · AA 44.33 · 85 t/sFull review →

DeepSeek V3.2DeepSeekopen weights

DeepSeek V3.2 is the sleeper pick for cost-sensitive coding API use. It matches GPT-5.2 on several coding benchmarks at a fraction of the price. If you're building AI-powered coding tools and can't use Chinese-hosted infrastructure for sensitive code, run it self-hosted.

💰 $0.27/$1.10 per 1M tokens via DeepSeek API. Cached input drops to $0.07/1M.

5.5

/10

SWE-bench: — · AA 41.61 · 45 t/sFull review →

GPT OSS 120BOpenAIopen weightsopen-source

GPT OSS 120B is OpenAI's first open-weight model and the top-ranked open-weight reasoning model on Artificial Analysis (Intelligence Index 33). At $0.15/$0.60 per 1M tokens and 336 t/s, it's the fastest and cheapest path to a frontier-lab open model for coding pipelines. Download the weights from Hugging Face and self-host for near-zero cost. Best for developers who want OpenAI-quality coding in their own infrastructure.

💰 $0.15/$0.60 per 1M tokens via W&B Inference / OpenRouter. Self-host from Hugging Face for free (requires H100-class GPU).

5.4

/10

SWE-bench: — · AA 33 · 336 t/sFull review →

How to pick the right coding LLM

For agentic coding pipelines and API-integrated tools, Gemini 3.1 Pro is the best-value frontier option — same price as Gemini 3 Pro with dramatically better reasoning and a dedicated custom-tools endpoint. For serious daily development inside an IDE, Cursor powered by Claude Opus or GPT-5.2 is worth $20/month. For coding chat without an IDE integration, the Claude Sonnet free tier is excellent. For API-integrated coding tools where cost matters, GPT-5 mini gives the best reasoning per dollar. For open-source coding infrastructure, GPT OSS 120B (self-hosted) or DeepSeek V3.2 (API) are the strongest budget options.

Last updated 2026-02-24 · Benchmark sources: Artificial Analysis, SWE-bench.com · Methodology →