Search

Search pages, services, tech stack, and blog posts

Open-Weight LLM Comparison 2026 Llama 4, DeepSeek V3.2, Gemma 4, MiniMax M2.5, Mistral Small 4, Qwen3.5 — self-hostable models compared on SWE-bench, licensing, and cost

Open-weight models closed the gap with closed models dramatically in 2026. MiniMax M2.5 hits 80.2% on SWE-bench Verified — on par with Claude Opus 4.6 — at $0.30/$1.20 per 1M tokens. Meta Llama 4 Scout offers a 10M-token context window on a single H100. Gemma 4 31B is the cleanest Apache 2.0 model on the market: no MAU caps, no EU restrictions, runs on a single H100. DeepSeek V3.2 remains the cheapest frontier API at $0.28/$0.42 per 1M tokens under an MIT license.

Open-Weight Frontier Models

Self-hostable models with public weights, evaluated on licensing, cost, benchmarks, and deployment requirements as of April 2026.

Llama 4 Maverick
Llama 4 MaverickMeta
Llama 4 Scout
Llama 4 ScoutMeta
DeepSeek V3.2
DeepSeek V3.2MIT
Gemma 4 31B
Gemma 4 31BApache 2.0
MM
MiniMax M2.5Modified MIT
Mistral Small 4
Mistral Small 4Apache 2.0
Q
Qwen3.5-397BApache 2.0
Dimension
Llama 4 MaverickLlama 4 Maverick
Llama 4 ScoutLlama 4 Scout
DeepSeek V3.2DeepSeek V3.2
Gemma 4 31BGemma 4 31B
MMMiniMax M2.5
Mistral Small 4Mistral Small 4
QQwen3.5-397B
Context window1M tokens10M tokens — largest of any production model163,840 tokensUp to 262K tokens1M input + 1M output tokens256K tokens262K native (extensible to ~1M tokens)
LicenseMeta Llama 4 Community — EU entities prohibited; >700M MAU needs Meta agreementMeta Llama 4 Community — EU entities prohibited; >700M MAU needs Meta agreementMIT — unrestricted commercial useApache 2.0 — no MAU caps, no EU restrictionModified MIT — attribution clause for large-scale commercialApache 2.0 — unrestricted commercial useApache 2.0 — unrestricted commercial use
API input price (per 1M tokens)~$0.17 (third-party providers)~$0.08 (third-party providers)$0.28 (DeepSeek API)$0.13 (direct API)$0.30 (MiniMax API)$0.15 (Mistral API)Varies by provider
API output price (per 1M tokens)~$0.60 (third-party providers)~$0.30 (third-party providers)$0.42 (DeepSeek API)$0.38 (direct API)$1.20 (MiniMax API)$0.60 (Mistral API)Varies by provider
SWE-bench VerifiedNot publicly confirmedNot publicly confirmed67.8% (V3.2-Speciale)Not confirmed in available benchmarks80.2% — highest confirmed score among open-weight modelsNot confirmed72.4% (27B dense variant, multilingual benchmark)
Min. self-hosting GPU4× H100 80GB (INT4); 8× H100 for production1× H100 80GB (INT4) — single-GPU deployable8× H100 80GB (FP8); ~5–6× H100 (INT4)1× H100 80GB (BF16, 8K ctx); B200 for full 262K2× B200 or 4× H100 minimum1× H100 estimated (6B active params)8× GPU via vLLM (tensor parallel for full 262K context)
Multimodal supportYes — text + image inputYes — text + image inputNo — text/code only (DeepSeek-VL2 is a separate model)Yes — text + image + audio + video (all model sizes)No — text/code onlyYes — text + image (vision-capable)Yes — native vision-language across all sizes
EU deployment safeNo — Meta license explicitly prohibits EU-domiciled entitiesNo — Meta license explicitly prohibits EU-domiciled entitiesYes — MIT license; note: Chinese lab (data handling considerations)Yes — Apache 2.0; Google (US-based)Yes — Modified MIT; note: Chinese labYes — Apache 2.0; French company, EU-nativeYes — Apache 2.0; note: Alibaba / Chinese lab
Best forNon-EU teams wanting top open MoE at 1M context with multimodalNon-EU teams needing 10M-token context on a single H100 at minimal API costCheapest frontier API; MIT license; high-volume coding and reasoningCleanest license (no restrictions); single H100; multimodal incl. video; EU-safeHighest SWE-bench score (80.2%); 1M in/out context; coding-heavy workloadsEU data residency; Apache 2.0; efficient small model; multimodal; low cost201 languages; ~1M context; Apache 2.0; adaptive reasoning; multilingual

When to choose each

Llama 4 Maverick

Llama 4 Maverick

  • Best open MoE model at 1M context — strong on multimodal and reasoning
  • Non-EU teams wanting Meta's strongest open release at $0.17/$0.60
  • Multimodal applications requiring open-weight models (text + image)
  • Teams comfortable with 4+ H100 self-hosting requirements
Llama 4 Scout

Llama 4 Scout

  • Ultra-long context tasks — 10M tokens on a single H100 (INT4)
  • Non-EU teams needing the cheapest third-party API ($0.08/$0.30)
  • Privacy-sensitive workloads requiring single-GPU on-premise deployment
  • RAG systems with massive document sets exceeding 1M-token limits
DeepSeek V3.2

DeepSeek V3.2

  • Cheapest frontier API at $0.28/$0.42 per 1M tokens — ~17× cheaper than Claude
  • MIT license for maximum commercial flexibility
  • High-volume coding or reasoning pipelines on a tight budget
  • Self-hosted deployments (5–6× H100 INT4 or 8× H100 FP8)
Gemma 4 31B

Gemma 4 31B

  • The only frontier open-weight model with Apache 2.0 and no usage restrictions
  • EU deployment with no data-residency concerns from a US-based lab
  • Single H100 80GB self-hosting for teams with limited GPU budget
  • Multimodal tasks including video and audio — all supported natively
MM

MiniMax M2.5

  • Highest SWE-bench Verified score of any open-weight model (80.2%)
  • 1M input + 1M output context — unique among open models
  • Coding-heavy workloads where benchmark quality is the priority
  • Teams willing to use 2× B200 or 4× H100 for the performance ceiling
Mistral Small 4

Mistral Small 4

  • EU data residency — French company with EU-native legal structure
  • Apache 2.0 with a small resource footprint (~1× H100 estimated)
  • Multimodal applications on a tight budget ($0.15/$0.60 per 1M)
  • Multilingual European applications across major EU languages
Q

Qwen3.5-397B

  • Multilingual applications spanning 201 languages
  • ~1M-token context extensibility under an Apache 2.0 license
  • Adaptive reasoning modes (Thinking, Fast, Auto) in a single model
  • Asia-Pacific teams integrated with Alibaba / NVIDIA NIM infrastructure

Our verdict

Workload-dependent — Gemma 4 is the most versatile for most teams

For the highest SWE-bench score among open models, MiniMax M2.5 (80.2%) is unmatched — on par with Claude Opus 4.6 at a fraction of the cost. For the cleanest license, Gemma 4 31B (Apache 2.0, single H100) is the standout. For the longest context window, Llama 4 Scout (10M tokens, single H100) wins — but Meta's license bans EU entities. For the cheapest API, DeepSeek V3.2 at $0.28/$0.42 per 1M wins. For EU data residency with Apache 2.0, Mistral Small 4 is the strongest pick from a Western lab. Note: DeepSeek V4 has not been released as of April 2026.

Sources & References

  1. 01
    Meta Llama 4 — Official Blog

    Llama 4 Scout (10M context) and Maverick (1M context) released April 5, 2025

  2. 02
    DeepSeek V3.2 API Docs

    Released December 1, 2025; 163,840 context; MIT license

  3. 03
    Gemma 4 — Google Blog

    Released April 2, 2026; Apache 2.0; single H100 deployable

  4. 04
    MiniMax M2.5

    80.2% SWE-bench Verified — highest confirmed score among open-weight models

  5. 05
    Mistral Small 4

    Released March 16, 2026; Apache 2.0; 256K context; vision-capable

  6. 06
    Qwen3.5-397B — NVIDIA NIM

    17B active / 397B total MoE; Apache 2.0; 262K native context

  7. 07
    Open LLM Leaderboard

    Community leaderboard tracking open-weight model benchmarks

Frequently asked questions




Related comparisons

Explore more technology comparisons.

Ready to start your AI project?

Tell us what you're building with AI. We'll respond within 24 hours.

1 spot available in May 2026Apr 2026 fully booked

We limit intake each month so every project gets the focus it deserves.