What services does A Major offer?

A Major offers web design, web app development, mobile app development (React Native, Swift, Flutter), SaaS product development, enterprise systems, UI/UX design, DevOps, performance optimization, MCP server development, AI agent infrastructure, MVP scoping, digital transformation, and engineering consultancy.

Where is A Major based?

A Major is based in Singapore and works with clients worldwide.

Do you work with international clients?

Yes. While headquartered in Singapore, A Major works with founders and businesses across Southeast Asia, Europe, and North America.

Ryu is an AI agent orchestration layer built by A Major. It enables teams to run, coordinate, and monitor AI agents at scale, connecting LLMs, tools, and workflows into reliable production systems.

Do you build MCP servers?

Yes. A Major builds Model Context Protocol (MCP) servers that connect AI agents to databases, APIs, and internal tools, enabling reliable LLM-powered workflows in production environments.

Open-Weight LLM Comparison 2026 Llama 4, DeepSeek V3.2, Gemma 4, MiniMax M2.5, Mistral Small 4, Qwen3.5 — self-hostable models compared on SWE-bench, licensing, and cost

Open-weight models closed the gap with closed models dramatically in 2026. MiniMax M2.5 hits 80.2% on SWE-bench Verified — on par with Claude Opus 4.6 — at $0.30/$1.20 per 1M tokens. Meta Llama 4 Scout offers a 10M-token context window on a single H100. Gemma 4 31B is the cleanest Apache 2.0 model on the market: no MAU caps, no EU restrictions, runs on a single H100. DeepSeek V3.2 remains the cheapest frontier API at $0.28/$0.42 per 1M tokens under an MIT license.

Open-Weight Frontier Models

Self-hostable models with public weights, evaluated on licensing, cost, benchmarks, and deployment requirements as of April 2026.

Llama 4 MaverickMeta

Llama 4 ScoutMeta

DeepSeek V3.2MIT

Gemma 4 31BApache 2.0

MiniMax M2.5Modified MIT

Mistral Small 4Apache 2.0

Qwen3.5-397BApache 2.0

Dimension	Llama 4 Maverick	Llama 4 Scout	DeepSeek V3.2	Gemma 4 31B	MMMiniMax M2.5	Mistral Small 4	QQwen3.5-397B
Context window	1M tokens	10M tokens — largest of any production model	163,840 tokens	Up to 262K tokens	1M input + 1M output tokens	256K tokens	262K native (extensible to ~1M tokens)
License	Meta Llama 4 Community — EU entities prohibited; >700M MAU needs Meta agreement	Meta Llama 4 Community — EU entities prohibited; >700M MAU needs Meta agreement	MIT — unrestricted commercial use	Apache 2.0 — no MAU caps, no EU restriction	Modified MIT — attribution clause for large-scale commercial	Apache 2.0 — unrestricted commercial use	Apache 2.0 — unrestricted commercial use
API input price (per 1M tokens)	~$0.17 (third-party providers)	~$0.08 (third-party providers)	$0.28 (DeepSeek API)	$0.13 (direct API)	$0.30 (MiniMax API)	$0.15 (Mistral API)	Varies by provider
API output price (per 1M tokens)	~$0.60 (third-party providers)	~$0.30 (third-party providers)	$0.42 (DeepSeek API)	$0.38 (direct API)	$1.20 (MiniMax API)	$0.60 (Mistral API)	Varies by provider
SWE-bench Verified	Not publicly confirmed	Not publicly confirmed	67.8% (V3.2-Speciale)	Not confirmed in available benchmarks	80.2% — highest confirmed score among open-weight models	Not confirmed	72.4% (27B dense variant, multilingual benchmark)
Min. self-hosting GPU	4× H100 80GB (INT4); 8× H100 for production	1× H100 80GB (INT4) — single-GPU deployable	8× H100 80GB (FP8); ~5–6× H100 (INT4)	1× H100 80GB (BF16, 8K ctx); B200 for full 262K	2× B200 or 4× H100 minimum	1× H100 estimated (6B active params)	8× GPU via vLLM (tensor parallel for full 262K context)
Multimodal support	Yes — text + image input	Yes — text + image input	No — text/code only (DeepSeek-VL2 is a separate model)	Yes — text + image + audio + video (all model sizes)	No — text/code only	Yes — text + image (vision-capable)	Yes — native vision-language across all sizes
EU deployment safe	No — Meta license explicitly prohibits EU-domiciled entities	No — Meta license explicitly prohibits EU-domiciled entities	Yes — MIT license; note: Chinese lab (data handling considerations)	Yes — Apache 2.0; Google (US-based)	Yes — Modified MIT; note: Chinese lab	Yes — Apache 2.0; French company, EU-native	Yes — Apache 2.0; note: Alibaba / Chinese lab
Best for	Non-EU teams wanting top open MoE at 1M context with multimodal	Non-EU teams needing 10M-token context on a single H100 at minimal API cost	Cheapest frontier API; MIT license; high-volume coding and reasoning	Cleanest license (no restrictions); single H100; multimodal incl. video; EU-safe	Highest SWE-bench score (80.2%); 1M in/out context; coding-heavy workloads	EU data residency; Apache 2.0; efficient small model; multimodal; low cost	201 languages; ~1M context; Apache 2.0; adaptive reasoning; multilingual

When to choose each

Llama 4 Maverick

Best open MoE model at 1M context — strong on multimodal and reasoning
Non-EU teams wanting Meta's strongest open release at $0.17/$0.60
Multimodal applications requiring open-weight models (text + image)
Teams comfortable with 4+ H100 self-hosting requirements

Llama 4 Scout

Ultra-long context tasks — 10M tokens on a single H100 (INT4)
Non-EU teams needing the cheapest third-party API ($0.08/$0.30)
Privacy-sensitive workloads requiring single-GPU on-premise deployment
RAG systems with massive document sets exceeding 1M-token limits

DeepSeek V3.2

Cheapest frontier API at $0.28/$0.42 per 1M tokens — ~17× cheaper than Claude
MIT license for maximum commercial flexibility
High-volume coding or reasoning pipelines on a tight budget
Self-hosted deployments (5–6× H100 INT4 or 8× H100 FP8)

Gemma 4 31B

The only frontier open-weight model with Apache 2.0 and no usage restrictions
EU deployment with no data-residency concerns from a US-based lab
Single H100 80GB self-hosting for teams with limited GPU budget
Multimodal tasks including video and audio — all supported natively

MiniMax M2.5

Highest SWE-bench Verified score of any open-weight model (80.2%)
1M input + 1M output context — unique among open models
Coding-heavy workloads where benchmark quality is the priority
Teams willing to use 2× B200 or 4× H100 for the performance ceiling

Mistral Small 4

EU data residency — French company with EU-native legal structure
Apache 2.0 with a small resource footprint (~1× H100 estimated)
Multimodal applications on a tight budget ($0.15/$0.60 per 1M)
Multilingual European applications across major EU languages

Qwen3.5-397B

Multilingual applications spanning 201 languages
~1M-token context extensibility under an Apache 2.0 license
Adaptive reasoning modes (Thinking, Fast, Auto) in a single model
Asia-Pacific teams integrated with Alibaba / NVIDIA NIM infrastructure

Our verdict

Workload-dependent — Gemma 4 is the most versatile for most teams

For the highest SWE-bench score among open models, MiniMax M2.5 (80.2%) is unmatched — on par with Claude Opus 4.6 at a fraction of the cost. For the cleanest license, Gemma 4 31B (Apache 2.0, single H100) is the standout. For the longest context window, Llama 4 Scout (10M tokens, single H100) wins — but Meta's license bans EU entities. For the cheapest API, DeepSeek V3.2 at $0.28/$0.42 per 1M wins. For EU data residency with Apache 2.0, Mistral Small 4 is the strongest pick from a Western lab. Note: DeepSeek V4 has not been released as of April 2026.

Sources & References

01
Meta Llama 4 — Official Blog
Llama 4 Scout (10M context) and Maverick (1M context) released April 5, 2025
02
DeepSeek V3.2 API Docs
Released December 1, 2025; 163,840 context; MIT license
03
Gemma 4 — Google Blog
Released April 2, 2026; Apache 2.0; single H100 deployable
04
MiniMax M2.5
80.2% SWE-bench Verified — highest confirmed score among open-weight models
05
Mistral Small 4
Released March 16, 2026; Apache 2.0; 256K context; vision-capable
06
Qwen3.5-397B — NVIDIA NIM
17B active / 397B total MoE; Apache 2.0; 262K native context
07
Open LLM Leaderboard
Community leaderboard tracking open-weight model benchmarks

Frequently asked questions

Related comparisons

Explore more technology comparisons.

Proprietary Llm Comparison Coding Agent Comparison

Ready to start your AI project?

Tell us what you're building with AI. We'll respond within 24 hours.

1 spot available in May 2026Apr 2026 fully booked

We limit intake each month so every project gets the focus it deserves.

Search