Best For

Best Budget LLM API in 2026 (Under $1/1M Tokens)

If you are building something that processes millions of tokens — a chatbot, a document pipeline, an auto-responder, a coding assistant — API cost compounds fast. The models below all come in under $1 per million input tokens and still deliver genuine capability. Here is what to use and when.

Updated February 2026

What actually matters for budget api

Before we get to the pick — the criteria that separate good from bad here:

Blended cost at your ratio — API pricing is quoted separately for input and output tokens. Your actual cost depends on your input:output ratio. A chatbot might be 1:3; a summarization pipeline might be 10:1. Always calculate blended cost at your real usage pattern.

Quality floor — Cheap is meaningless if the model can't do the task. The question isn't 'what's cheapest' — it's 'what's cheapest that's still good enough for my use case.' Quality-per-dollar is the right metric.

Rate limits and throughput — Budget models often have tight rate limits — requests per minute, tokens per day. At scale, a rate limit that triggers constantly is more expensive than a model that costs slightly more but never throttles you.

Latency — Cheaper models are often slower. For batch processing jobs running overnight, latency doesn't matter. For real-time user-facing applications, a 5-second response time kills the product regardless of price.

Our pick

GPT OSS 120BOpenAI

5.4/10

GPT OSS 120B is the most capable model under $1/1M input, with an AA Intelligence Index of 33 — #1 among all open-weight reasoning models. At $0.15/$0.60 per 1M tokens and 336 t/s output speed, it is both the cheapest and fastest option in this tier. Open weights mean you can self-host to reduce costs to near-zero. The tradeoff: knowledge cutoff of May 2024, 131K context window, and no official consumer interface.

Pricing: Via W&B Inference / OpenRouter: $0.15/$0.60 per 1M tokens. Self-host from Hugging Face for free (requires H100-class hardware).

Try GPT OSS 120B →Full review

Also consider

DeepSeek V3.2DeepSeek

5.5/10

DeepSeek V3.2 at $0.27/$1.10 per 1M tokens is the best budget option for general-purpose pipelines. It matches or beats frontier models on coding and reasoning benchmarks while costing a fraction of the price. Prompt caching drops input to $0.07/1M. The main limitation is Chinese data jurisdiction — not appropriate for sensitive enterprise workloads.

$0.27/$1.10 per 1M tokens. Cache hits: $0.07/1M input. Free web chat at chat.deepseek.com.

Full review →

Kimi K2Moonshot AI

5.0/10

Kimi K2 at $0.39/$1.90 per 1M tokens gives you a 1 trillion parameter open-weight model — the largest architecture in this price range — with a 262K context window. Strong AA Intelligence Index of 31, solid agentic performance. Good fit for pipelines that need large-context processing at low cost. Chinese data jurisdiction applies.

$0.39/$1.90 per 1M tokens via Moonshot API. Also available on Together AI and OpenRouter. Open weights available for self-hosting.

Full review →

Qwen 3 235BAlibaba

3.3/10

Qwen 3 235B at $0.20/$0.88 per 1M tokens is the cheapest managed API for a model of this scale. MIT license means you can also self-host for free. Best suited for non-sensitive workloads where you want large-model capability at minimal cost. AA Index of 17 reflects its April 2025 release — newer models have since surpassed it on quality.

$0.20/$0.88 per 1M tokens on Alibaba Cloud. MIT license — self-host for free. Also on Together AI, Fireworks, OpenRouter.

Full review →

Bottom line

If you need the best raw quality per dollar and can handle open-source infrastructure: GPT OSS 120B (self-hosted) or DeepSeek V3.2 (API). If you need a large context window on a budget: Kimi K2 (262K, $0.39/1M) or Gemini 3 Flash (1M context, $0.10/1M). If you need a Western-jurisdiction budget option: Gemini 3 Flash, GPT-5 mini, or Mistral Large 3. Rule of thumb: if cost matters, the only way to know which model is best for your specific task is to benchmark on your own data.

Updated February 2026 · How we choose →← All use cases