Beta
Model familiesReading · ~3 min · 67 words deep

Qwen

Alibaba's open-weight LLM family · leads on Chinese-language benchmarks and strong in multilingual tasks.

Qwen family on /family
TL;DR

Alibaba's open-weight LLM family · leads on Chinese-language benchmarks and strong in multilingual tasks.

Level 1

Qwen (通义千问, Tongyi Qianwen) is Alibaba's foundation-model family, open-weight since 2023. Versions: Qwen 1, Qwen 2, Qwen 2.5, Qwen 3 (2025). Tier structure includes dense models from 0.5B to 72B and MoE variants scaling to 235B total parameters. Qwen 3 is the 2026 leading open-weight Chinese LLM · it also performs strongly on English and multilingual benchmarks, putting pressure on Llama in the open-source tier.

Level 2

Qwen architecture: decoder-only transformer with RoPE, SwiGLU, similar to Llama. Qwen 3 235B is an MoE with 22B active parameters. Qwen's differentiator is training data · optimized for Chinese and multilingual coverage with heavy filtering for quality. Qwen ships under Apache 2.0 for most variants (truly permissive) · more open than Llama's custom license. The Qwen ecosystem includes Qwen-VL (multimodal), Qwen-Audio, Qwen-Coder (specialized), and many community fine-tunes.

Level 3

Qwen 2.5 72B outperformed Llama 3.1 70B on many multilingual benchmarks. Qwen 3 235B MoE activates 22B per token, putting it between dense 70B and dense 405B in quality. Training: 18T+ tokens for Qwen 2.5. Alibaba uses their Ascend 910 chips alongside NVIDIA for training · one of the few non-NVIDIA-dependent Chinese training stacks. Qwen-Coder is SWE-bench-competitive among open-weight models. The Apache 2.0 license removes commercial friction entirely · contributing factor to Qwen's rapid adoption in 2025-2026.

Why this matters now

Qwen 3 closed the open-weight quality gap to Llama. Apache 2.0 license makes it the default choice for enterprises avoiding Llama's custom terms.

The takeaway for you
If you are a
Researcher
  • ·Decoder-only transformer · SwiGLU + RoPE + RMSNorm
  • ·Qwen 3 235B MoE with 22B active
  • ·18T+ token training, multilingual data emphasis
If you are a
Builder
  • ·Qwen 2.5 72B for balanced quality + cost
  • ·Qwen 3 235B MoE for frontier open-weight quality
  • ·Apache 2.0 license · no commercial restrictions unlike Llama
If you are a
Investor
  • ·China's answer to Llama · strong domestic AI ecosystem driver
  • ·Alibaba's training infra on Ascend reduces NVIDIA dependency
  • ·Qwen adoption signals willingness to deploy Chinese AI in Western enterprises
If you are a
Curious · Normie
  • ·Alibaba's open-source AI
  • ·Best for Chinese language · also competitive in English
  • ·Can be used for free in commercial products (Apache license)
Gecko's take

Qwen is the open-weight sleeper hit of 2025-2026. Apache 2.0 plus quality parity with Llama means it's winning the enterprise open-model rewrite.

Yes. Most Qwen variants ship under Apache 2.0 · fully permissive, commercial use allowed without restrictions.