TL;DR: Qwen 3.6 27B scores 77.2 on SWE-Bench Verified (Qwen3.6-27B card, 2026) and downloads at ~17GB on Ollama, so it fits a 32GB Mac. The MoE sibling, 35B-A3B, activates only 3B params per token — faster on Apple Silicon. Pull either with one command: ollama run qwen3.6:27b. Speeds below are ModelFit estimates.
Alibaba's Qwen team shipped Qwen 3.6 in April 2026, and the open-weight variants are built for exactly the hardware most people already own: a Mac with unified memory. Two models matter for local use — a 27B dense model and a 35B-A3B Mixture-of-Experts model. Both are Apache 2.0, both carry a 262K-token context (extensible toward 1M), and both run through Ollama or MLX. This guide covers which one fits your chip, real benchmark numbers from the official cards, and the fastest way to run them.
What is Qwen 3.6?
Qwen 3.6 is the April 2026 release of Alibaba's open-weight LLM family, and the two Mac-relevant variants are Qwen3.6-27B (dense) and Qwen3.6-35B-A3B (Mixture-of-Experts, 35B total / 3B active per token). Both ship under Apache 2.0, both are natively multimodal (text + image), and both advertise a 262,144-token context window (QwenLM/Qwen3.6 GitHub, 2026).
The headline is coding. The 27B model leads its size class on agentic and repository-level tasks, while the 35B-A3B trades a little accuracy for speed by activating just 3B parameters per token. For a Mac, that MoE design is the interesting part: you pay 35B-class quality in download size but closer to 3B-class compute per token.
Can your Mac run Qwen 3.6?
Yes, if you have 32GB of unified memory or more. The download sizes are modest, but you need headroom above the model weights for context and the OS. Here is how the two open variants map to Apple Silicon.| Model | Ollama download | Min RAM | Comfortable on |
|---|---|---|---|
| Qwen3.6 27B (dense) | ~17GB | 24GB | 32GB+ (M4 Pro, M5 Pro, M-series Max) |
| Qwen3.6 35B-A3B (MoE) | ~24GB | 24GB | 36GB+ (M4 Max, M5 Max, Studio) |
| Qwen3.6 27B-MLX | ~20GB | 24GB | 32GB+ |
On a 24GB Mac both models load but leave little room for long prompts — close other apps and keep context modest. On 32GB the 27B is comfortable; on 36GB or more the 35B-A3B has room to stretch its context. If you are squeezing a large model onto 16GB, the mmap streaming technique helps, but 32GB+ is the honest recommendation here.
Qwen 3.6 benchmarks: how good is it really?
Qwen 3.6 27B posts frontier-class scores for an open model you can run at home. Every number below is taken verbatim from the official Hugging Face model cards.| Benchmark | Qwen3.6 27B | Qwen3.6 35B-A3B |
|---|---|---|
| SWE-Bench Verified | 77.2 | 73.4 |
| MMLU-Pro | 86.2 | 85.2 |
| GPQA Diamond | 87.8 | 86.0 |
| AIME26 | 94.1 | 92.7 |
| LiveCodeBench v6 | 83.9 | 80.4 |
Sources: Qwen3.6-27B card, Qwen3.6-35B-A3B card, 2026.
The 77.2 on SWE-Bench Verified is the figure to anchor on — that is real software-engineering task completion, not a trivia quiz. The dense 27B edges the 35B-A3B on every metric, which is the expected trade: dense models reason a bit harder, MoE models run a bit faster.
Qwen 3.6 27B vs 35B-A3B: which for your Mac?
Pick the 27B dense model for maximum quality, the 35B-A3B for speed. They cost about the same memory, so the decision is really compute, not RAM.- Choose 27B dense if your priority is coding accuracy and you have a Max-tier chip that can push a dense model fast enough. It wins every benchmark above.
- Choose 35B-A3B if you want snappier responses for chat and agent loops. Activating 3B params per token makes it noticeably quicker on the same Mac, especially for long generations.
For most people running an agent or coding assistant all day, the 35B-A3B feels better despite the slightly lower scores — latency matters more than two points of MMLU-Pro in daily use. If you batch-run evals or want the best single answer, the 27B is the pick.
How to run Qwen 3.6 on Mac
The fastest path is Ollama — one command pulls and runs the model. Make sure you are on a recent build first (see the Ollama on Mac install guide).# Dense 27B — best quality
ollama run qwen3.6:27b
# MoE 35B-A3B — faster on Apple Silicon
ollama run qwen3.6:35b-a3b
For the best throughput on Apple Silicon, use the MLX builds. Qwen ships official MLX support: "both mlx-lm (text-only) and mlx-vlm (vision + text) support Qwen3.6" (QwenLM/Qwen3.6 GitHub, 2026). Ollama also publishes an MLX tag (qwen3.6:27b-mlx) that runs the Apple-optimized kernels under the hood.
Qwen 3.6 vs Qwen 3.5: what changed?
Qwen 3.6 is a stability-and-utility release that builds directly on Qwen 3.5. The official GitHub README frames it around "stability and real-world utility," with stronger agentic coding for front-end workflows and repository-level reasoning, plus a new "Thinking Preservation" feature that keeps reasoning context across a conversation (QwenLM/Qwen3.6 GitHub, 2026).Qwen has not published verbatim 3.5-vs-3.6 deltas in text, so treat "X% better" claims with caution. What is clear: the 3.6 27B's 77.2 SWE-Bench Verified is a strong coding result for a model that fits a 32GB Mac, and the lineup keeps the same generous 262K context. If you are already running a Qwen 3.5 model and it works, the upgrade is incremental — worth it for coding-heavy use, optional otherwise.
For the full picture of which model fits your specific chip and RAM, see the best LLM for MacBook guide.
FAQ
What is the best Qwen 3.6 model for a Mac?
For 32GB Macs, Qwen3.6 27B gives the best quality at a ~17GB download. For 36GB+ Macs that want faster responses, Qwen3.6 35B-A3B (MoE) trades two benchmark points for noticeably better latency.
How much RAM do I need to run Qwen 3.6?
Plan for 32GB of unified memory. Both open variants download at 17–24GB, and you need headroom above the weights for context and the OS. They load on 24GB but with little room for long prompts.
Is Qwen 3.6 free to use commercially?
Yes. The open-weight Qwen3.6-27B and 35B-A3B are released under the Apache 2.0 license (QwenLM/Qwen3.6 GitHub, 2026), which permits commercial use.
Does Qwen 3.6 support Apple's MLX framework?
Yes. Qwen ships official MLX support for both text (mlx-lm) and vision (mlx-vlm), and Ollama publishes an MLX tag for Apple-optimized inference.
How fast is Qwen 3.6 on Apple Silicon?
It depends on the chip and quantization. ModelFit estimates roughly 12–22 tok/s (27B dense) and 25–55 tok/s (35B-A3B MoE) across M4 Pro to M5 Max via MLX. These are estimates, not measured benchmarks.
Match this model to a machine that can run it — by RAM tier for Apple Silicon, or by VRAM for an NVIDIA GPU.
The weekly local-AI refresh
New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.
Have questions? Reach out on X/Twitter