Open-Weight LLM Comparison 2026 Llama 4, DeepSeek V3.2, Gemma 4, MiniMax M2.5, Mistral Small 4, Qwen3.5 — self-hostable models compared on SWE-bench, licensing, and cost
Open-weight models closed the gap with closed models dramatically in 2026. MiniMax M2.5 hits 80.2% on SWE-bench Verified — on par with Claude Opus 4.6 — at $0.30/$1.20 per 1M tokens. Meta Llama 4 Scout offers a 10M-token context window on a single H100. Gemma 4 31B is the cleanest Apache 2.0 model on the market: no MAU caps, no EU restrictions, runs on a single H100. DeepSeek V3.2 remains the cheapest frontier API at $0.28/$0.42 per 1M tokens under an MIT license.
Open-Weight Frontier Models
Self-hostable models with public weights, evaluated on licensing, cost, benchmarks, and deployment requirements as of April 2026.
| Dimension | MMMiniMax M2.5 | QQwen3.5-397B | |||||
|---|---|---|---|---|---|---|---|
| Context window | 1M tokens | 10M tokens — largest of any production model | 163,840 tokens | Up to 262K tokens | 1M input + 1M output tokens | 256K tokens | 262K native (extensible to ~1M tokens) |
| License | Meta Llama 4 Community — EU entities prohibited; >700M MAU needs Meta agreement | Meta Llama 4 Community — EU entities prohibited; >700M MAU needs Meta agreement | MIT — unrestricted commercial use | Apache 2.0 — no MAU caps, no EU restriction | Modified MIT — attribution clause for large-scale commercial | Apache 2.0 — unrestricted commercial use | Apache 2.0 — unrestricted commercial use |
| API input price (per 1M tokens) | ~$0.17 (third-party providers) | ~$0.08 (third-party providers) | $0.28 (DeepSeek API) | $0.13 (direct API) | $0.30 (MiniMax API) | $0.15 (Mistral API) | Varies by provider |
| API output price (per 1M tokens) | ~$0.60 (third-party providers) | ~$0.30 (third-party providers) | $0.42 (DeepSeek API) | $0.38 (direct API) | $1.20 (MiniMax API) | $0.60 (Mistral API) | Varies by provider |
| SWE-bench Verified | Not publicly confirmed | Not publicly confirmed | 67.8% (V3.2-Speciale) | Not confirmed in available benchmarks | 80.2% — highest confirmed score among open-weight models | Not confirmed | 72.4% (27B dense variant, multilingual benchmark) |
| Min. self-hosting GPU | 4× H100 80GB (INT4); 8× H100 for production | 1× H100 80GB (INT4) — single-GPU deployable | 8× H100 80GB (FP8); ~5–6× H100 (INT4) | 1× H100 80GB (BF16, 8K ctx); B200 for full 262K | 2× B200 or 4× H100 minimum | 1× H100 estimated (6B active params) | 8× GPU via vLLM (tensor parallel for full 262K context) |
| Multimodal support | Yes — text + image input | Yes — text + image input | No — text/code only (DeepSeek-VL2 is a separate model) | Yes — text + image + audio + video (all model sizes) | No — text/code only | Yes — text + image (vision-capable) | Yes — native vision-language across all sizes |
| EU deployment safe | No — Meta license explicitly prohibits EU-domiciled entities | No — Meta license explicitly prohibits EU-domiciled entities | Yes — MIT license; note: Chinese lab (data handling considerations) | Yes — Apache 2.0; Google (US-based) | Yes — Modified MIT; note: Chinese lab | Yes — Apache 2.0; French company, EU-native | Yes — Apache 2.0; note: Alibaba / Chinese lab |
| Best for | Non-EU teams wanting top open MoE at 1M context with multimodal | Non-EU teams needing 10M-token context on a single H100 at minimal API cost | Cheapest frontier API; MIT license; high-volume coding and reasoning | Cleanest license (no restrictions); single H100; multimodal incl. video; EU-safe | Highest SWE-bench score (80.2%); 1M in/out context; coding-heavy workloads | EU data residency; Apache 2.0; efficient small model; multimodal; low cost | 201 languages; ~1M context; Apache 2.0; adaptive reasoning; multilingual |
When to choose each
Llama 4 Maverick
- Best open MoE model at 1M context — strong on multimodal and reasoning
- Non-EU teams wanting Meta's strongest open release at $0.17/$0.60
- Multimodal applications requiring open-weight models (text + image)
- Teams comfortable with 4+ H100 self-hosting requirements
Llama 4 Scout
- Ultra-long context tasks — 10M tokens on a single H100 (INT4)
- Non-EU teams needing the cheapest third-party API ($0.08/$0.30)
- Privacy-sensitive workloads requiring single-GPU on-premise deployment
- RAG systems with massive document sets exceeding 1M-token limits
DeepSeek V3.2
- Cheapest frontier API at $0.28/$0.42 per 1M tokens — ~17× cheaper than Claude
- MIT license for maximum commercial flexibility
- High-volume coding or reasoning pipelines on a tight budget
- Self-hosted deployments (5–6× H100 INT4 or 8× H100 FP8)
Gemma 4 31B
- The only frontier open-weight model with Apache 2.0 and no usage restrictions
- EU deployment with no data-residency concerns from a US-based lab
- Single H100 80GB self-hosting for teams with limited GPU budget
- Multimodal tasks including video and audio — all supported natively
MiniMax M2.5
- Highest SWE-bench Verified score of any open-weight model (80.2%)
- 1M input + 1M output context — unique among open models
- Coding-heavy workloads where benchmark quality is the priority
- Teams willing to use 2× B200 or 4× H100 for the performance ceiling
Mistral Small 4
- EU data residency — French company with EU-native legal structure
- Apache 2.0 with a small resource footprint (~1× H100 estimated)
- Multimodal applications on a tight budget ($0.15/$0.60 per 1M)
- Multilingual European applications across major EU languages
Qwen3.5-397B
- Multilingual applications spanning 201 languages
- ~1M-token context extensibility under an Apache 2.0 license
- Adaptive reasoning modes (Thinking, Fast, Auto) in a single model
- Asia-Pacific teams integrated with Alibaba / NVIDIA NIM infrastructure
Our verdict
For the highest SWE-bench score among open models, MiniMax M2.5 (80.2%) is unmatched — on par with Claude Opus 4.6 at a fraction of the cost. For the cleanest license, Gemma 4 31B (Apache 2.0, single H100) is the standout. For the longest context window, Llama 4 Scout (10M tokens, single H100) wins — but Meta's license bans EU entities. For the cheapest API, DeepSeek V3.2 at $0.28/$0.42 per 1M wins. For EU data residency with Apache 2.0, Mistral Small 4 is the strongest pick from a Western lab. Note: DeepSeek V4 has not been released as of April 2026.
Sources & References
- 01Meta Llama 4 — Official Blog
Llama 4 Scout (10M context) and Maverick (1M context) released April 5, 2025
- 02DeepSeek V3.2 API Docs
Released December 1, 2025; 163,840 context; MIT license
- 03Gemma 4 — Google Blog
Released April 2, 2026; Apache 2.0; single H100 deployable
- 04MiniMax M2.5
80.2% SWE-bench Verified — highest confirmed score among open-weight models
- 05Mistral Small 4
Released March 16, 2026; Apache 2.0; 256K context; vision-capable
- 06Qwen3.5-397B — NVIDIA NIM
17B active / 397B total MoE; Apache 2.0; 262K native context
- 07Open LLM Leaderboard
Community leaderboard tracking open-weight model benchmarks
Frequently asked questions
Related comparisons
Explore more technology comparisons.
Ready to start your AI project?
Tell us what you're building with AI. We'll respond within 24 hours.
We limit intake each month so every project gets the focus it deserves.