Key Takeaways
- Alibaba Cloud’s Qwen team released Qwen3.5-397B-A17B, a 397-billion-parameter open-weight vision-language model that activates only 17 billion parameters per token. Thus delivering frontier performance at a fraction of the inference cost.
- The hybrid Gated Delta Networks with the Sparse MoE architecture delivers 8.6x–19x faster decoding throughput than the previous Qwen3-Max generation. Therefore breaking the cost-performance barrier for multimodal AI agents.
- Qwen3.5 natively supports 201 languages and a 262K-token context window extensible to over 1 million tokens.
- API pricing starts at $0.40/M input tokens and $2.40/M output tokens internationally , which is roughly 1/18th the cost of Gemini 3 Pro and a fraction of GPT-5.2.
Quick Recap
Alibaba Cloud officially released Qwen3.5-397B-A17B on February 15, 2026 — its flagship open-weight vision-language model designed to power the next generation of AI agents. The announcement, made via the Qwen team’s official channels and Alibaba Cloud’s social media, positions Qwen3.5 as a “native multimodal agent” built for coding, reasoning, GUI interaction, video comprehension, and agentic workflows across 201 languages. The model weights are available on Hugging Face under the Apache 2.0 license, making it freely deployable for developers and enterprises alike.
Inside the Architecture: Gated Delta Networks Meet Sparse MoE
Qwen3.5’s technical design marks a significant departure from standard Transformer architectures. At its core, the model uses a hybrid layout across 60 layers: a repeating pattern of 3 Gated DeltaNet-plus-MoE blocks followed by 1 Gated Attention-plus-MoE block, cycled 15 times. The Gated Delta Networks replace traditional quadratic-complexity attention with linear attention, enabling near-linear scaling as context lengths grow — a critical advantage for processing documents and conversations exceeding 32K tokens.
The Mixture-of-Experts layer contains 512 total experts, activating only 10 routed experts plus 1 shared expert per token — totaling 11 active experts. This sparse routing is why the model’s effective compute footprint resembles a ~20B-parameter model rather than a 400B one, dramatically cutting inference costs. Alibaba reports that this design delivers 60% lower cost and 8.6x–19x higher decoding throughput compared to its previous generation.
On benchmarks, the results are competitive with much larger closed-source models. Qwen3.5 scores 93.3% on AIME 2026, 85.0 on LiveCodeBench v6, and 76.8% on SWE-bench Verified for coding agent tasks. It scores 45 on Artificial Analysis’s Intelligence Index — a 16-point leap over Qwen3 235B — placing it third among all models evaluated. Notably, all of this is achieved with 17B active parameters, far fewer than peers like Kimi K2.5 (32B active) and DeepSeek V3.2 (37B active).
The Open-Weight Multimodal Arms Race Heats Up, Why Now?
Qwen3.5’s release arrives in a fiercely competitive moment. February 2026 has already seen major model launches from multiple Chinese AI labs, including Zhipu’s GLM-5 (744B/40B) and Moonshot AI’s Kimi K2.5 (1T/32B), while U.S. labs maintain pressure with GPT-5.2 and Gemini 3 Pro. What differentiates Qwen3.5 is the convergence of three strategic vectors: open weights under Apache 2.0, extreme inference efficiency, and native multimodality.
The broader market context favors this approach. Enterprises are increasingly seeking models they can self-host and customize, rather than relying solely on closed APIs — a trend that open-weight releases directly serve. Meanwhile, the shift toward AI “agents” that can autonomously browse, code, and interact with software tools has made multimodal capability table-stakes rather than a premium feature. Alibaba’s aggressive pricing — positioning Qwen3.5-Plus at roughly $0.12/M tokens for Chinese domestic users — signals a willingness to subsidize adoption and capture developer ecosystem share, mirroring the hyperscaler playbook seen in cloud computing.
Competitive Landscape & Comparison
The table below compares Qwen3.5-397B-A17B against its two most relevant open-weight competitors: Moonshot AI’s Kimi K2.5 and DeepSeek V3.2, both of which target the same agent-focused, high-performance segment.
Competitive Landscape & Comparison
The table below compares Qwen3.5-397B-A17B against its two most relevant open-weight competitors: Moonshot AI’s Kimi K2.5 and DeepSeek V3.2, both of which target the same agent-focused, high-performance segment.
| Feature / Metric | Qwen3.5-397B-A17B | Kimi K2.5 (1T-A32B) | DeepSeek V3.2 (685B-A37B) |
| Total / Active Parameters | 397B / 17B | 1T / 32B | 685B / 37B |
| Context Window | 262K (extensible to 1M) | 256K | 164K |
| Pricing (Input / Output per 1M tokens) | $0.40 / $2.40 | $0.60 / $3.00 | $0.28 / $0.42 |
| Multimodal Support | Native vision-language (text, image, video) | Text, image, video | Text only (no vision) |
| Agentic Capabilities | GUI interaction, coding agents, tool use, MCP support | SWE-bench 76.8%, Agent Swarm support | Reasoning-focused, function calling |
| Language Support | 201 languages | Not disclosed | Not disclosed |
| License | Apache 2.0 | Modified MIT | Open-weight |
| Intelligence Index (AA) | 45 | 46 | 42 |
While Kimi K2.5 edges slightly ahead on the Artificial Analysis Intelligence Index (46 vs. 45), it does so with nearly double the active parameters (32B vs. 17B), making Qwen3.5 the clear efficiency leader on a per-parameter basis. DeepSeek V3.2 remains the most cost-effective option for text-only, high-volume API users at $0.28/$0.42 per million tokens, but its lack of native vision support and shorter 164K context window make it a poor fit for the multimodal agent workflows where Qwen3.5 and Kimi K2.5 compete most directly.
Sci-Tech Today’s Takeaway
I’ll be honest — this one caught my attention more than most model drops. In my experience covering AI releases, the spec sheet alone rarely tells the full story, but Qwen3.5’s numbers are genuinely hard to dismiss. Activating only 17 billion parameters from a 397B pool, while matching or approaching GPT-5.2 and Claude 4.5 Opus on flagship benchmarks, is an engineering achievement that reshapes what “open-weight” means in practice.
I think this is a big deal because it collapses the cost moat that closed-model providers have relied on. When a self-hostable, Apache 2.0-licensed model can score 93.3% on AIME 2026 and handle native vision, video, and GUI agentic tasks — all at $0.40 per million input tokens — the value proposition for paying $5–$7 per million tokens to GPT-5.2 or Gemini 3 Pro becomes a much harder sell for many enterprise workloads.
I generally prefer to be cautious about “frontier killer” claims, and Qwen3.5 isn’t flawless — it still trails on agentic benchmarks like TAU2-Bench and DeepPlanning compared to top closed models. But for developer adoption and the broader open-source ecosystem, I’m firmly bullish. The Gated Delta Networks architecture is genuinely novel, the pricing is aggressive, and the 201-language support opens doors in markets that English-first models barely serve. This feels less like an incremental update and more like a statement: the open-weight frontier isn’t just catching up — it’s setting the pace.
Sources
- ArtificialAnalysis
- MarkTechPost
- Together.AI
- HuggingFace
- DigitalApplied
- Recodechinaai.Substack
- NXcode
- Nvidia-build
- AMD
- Introl
- FreeCodeCamp
- CaixinGlobal
- Together
- AtlasCloud
- ArtificialAnalysis
- CloudPrice
- Devtk.AI
- Qwen.AI
- VentureBeat
- Evolink.AI
- OpenRouter
- LLM
- Aicerts
- Openrouter
- Aastocks
- PricePerToken
- Eesel
- AlibabaCloud
