2026-02-25
DeepSeek-V3 vs Qwen 3.5: Which Local LLM Wins on Mac?
We tested both extensively on Apple Silicon to find the winner.
The Contenders
| Model | Architecture | Total Params | Active Params | Context | MoE? |
|---|---|---|---|---|---|
| DeepSeek-V3 | MLA + MoE | 671B | 37B | 64K | ✅ Yes |
| Qwen 3.5-122B | Transformer + MoE | 122B | 10B | 128K | ✅ Yes |
Both use Mixture of Experts (MoE) — only activating a subset of parameters per token. This makes massive models runnable on consumer hardware.
Test Setup
Hardware:- MacBook Pro M4 Max (36GB RAM)
- Mac Studio M2 Ultra (128GB RAM)
- DeepSeek-V3 (Q4_K_M, ~380GB → doesn't fit, used Q3_K_M ~280GB)
- Qwen 3.5-122B-A10B (Q4_K_M, ~72GB)
- Qwen 3.5-35B-A3B (Q4_K_M, ~20GB) — for comparison
Round 1: Quality (MMLU Benchmark)
| Model | MMLU Score | Notes |
|---|---|---|
| DeepSeek-V3 | 87.1% | Exceptional reasoning |
| Qwen 3.5-122B | 84.8% | Very strong |
| GPT-4 (reference) | 88.7% | Cloud baseline |
DeepSeek-V3 comes closest to GPT-4 quality. It's particularly strong in:
- Mathematical reasoning
- Code generation
- Complex multi-step tasks
Round 2: Speed (Tokens/Second)
On Mac Studio M2 Ultra (128GB)
| Model | Speed | Batch Size | Notes |
|---|---|---|---|
| Qwen 3.5-122B | 35 tok/s | 1024 | Smooth |
| DeepSeek-V3 (Q3) | 12 tok/s | 512 | Slower but usable |
On MacBook Pro M4 Max (36GB)
Can't run DeepSeek-V3 (needs 72GB+ for Q4, 56GB+ for Q3).
| Model | Speed | Notes |
|---|---|---|
| Qwen 3.5-35B-A3B | 42 tok/s | Excellent |
| DeepSeek-V2.5 (smaller) | 28 tok/s | Alternative |
Round 3: RAM Requirements
| Model | Min RAM (Q4) | Recommended | VRAM Pressure |
|---|---|---|---|
| DeepSeek-V3 | 72GB | 96GB+ | 🔴 High |
| Qwen 3.5-122B | 72GB | 96GB+ | 🔴 High |
| Qwen 3.5-35B-A3B | 20GB | 24GB | 🟢 Low |
DeepSeek-V3's 671B parameters make it incredibly VRAM-hungry. Even Q3 quantization needs ~56GB.
Winner: Qwen 3.5 (more flexible sizing)Round 4: Context Window
| Model | Context | Effective Use |
|---|---|---|
| DeepSeek-V3 | 64K | Good |
| Qwen 3.5-122B | 128K | Better |
| Qwen 3.5-Flash | 1M | Best for RAG |
Longer context means:
- Larger codebases in one prompt
- Longer document analysis
- Better multi-turn conversations
Round 5: Coding Performance (HumanEval)
| Model | Pass@1 | Strengths |
|---|---|---|
| DeepSeek-V3 | 92.0% | Algorithmic problems |
| Qwen 3.5-122B | 82.5% | General coding |
| Claude 3.5 (ref) | 92.0% | Cloud benchmark |
DeepSeek-V3 is genuinely exceptional at coding — matching the best cloud models.
Round 6: Practical Usage
When to Choose DeepSeek-V3
✅ Choose DeepSeek-V3 if:
- You have Mac Studio 128GB
- Maximum quality is priority
- Heavy coding workloads
- You're okay with slower generation (12 tok/s)
When to Choose Qwen 3.5
✅ Choose Qwen 3.5 if:
- You have MacBook Pro 24-36GB (35B-A3B model)
- Speed matters (35-45 tok/s)
- Long context needed (128K+)
- You want more model size options
The Verdict by Use Case
For MacBook Pro Users (24-36GB RAM)
Winner: Qwen 3.5-35B-A3BDeepSeek-V3 simply won't fit. Qwen 3.5-35B-A3B delivers:
- 82.1% MMLU (excellent)
- 42 tok/s (fast)
- 20GB RAM usage (fits comfortably)
For Mac Studio Users (64GB+ RAM)
Winner: Depends on priority| Priority | Winner | Model |
|---|---|---|
| Quality | DeepSeek-V3 | 671B, 87.1% MMLU |
| Speed | Qwen 3.5 | 122B, 35 tok/s |
| Balance | Qwen 3.5 | 35B-A3B on 24GB |
Our Recommendation
For most users with MacBook Pro 24-48GB:
→ Qwen 3.5-35B-A3B is the practical winner.
For power users with Mac Studio 128GB who want maximum quality:
→ DeepSeek-V3 is worth the VRAM investment.
Quick Reference
# Qwen 3.5-35B-A3B (recommended for most)
ollama run qwen3.5:35b-a3b
# DeepSeek-V3 (Mac Studio only)
ollama run deepseek-v3
# Qwen 3.5-122B (Mac Studio alternative)
ollama run qwen3.5:122b-a10b
Related: Learn about the Qwen 3.5 Medium series in detail, see our MacBook Pro recommendations, or check the full local vs cloud benchmark.
Frequently Asked Questions
Can I run DeepSeek-V3 on a MacBook Pro?
No. DeepSeek-V3 requires at least 72GB RAM for Q4 quantization (56GB for Q3). Only Mac Studio with 96GB+ RAM or a Mac Pro with sufficient memory can run it. For MacBook Pro users, Qwen 3.5-35B-A3B is the recommended alternative.
Which model is better for coding on Mac?
DeepSeek-V3 scores 92% on HumanEval (matching Claude 3.5), while Qwen 3.5-122B scores 82.5%. For coding, DeepSeek-V3 wins if you have the RAM. On a MacBook Pro with 24-36GB, Qwen 3.5-35B-A3B (72.1% HumanEval) is the best available coding model.
How fast is Qwen 3.5 compared to DeepSeek-V3 on Apple Silicon?
Qwen 3.5-122B runs at 35 tokens per second on Mac Studio M2 Ultra, roughly 3x faster than DeepSeek-V3 at 12 tok/s. The smaller Qwen 3.5-35B-A3B achieves 42 tok/s on MacBook Pro M4 Max, making it the fastest high-quality option.
Should I use Q3 or Q4 quantization for DeepSeek-V3?
Q4_K_M gives better quality but needs ~380GB (won't fit on any Mac). Q3_K_M reduces this to ~280GB, fitting on a Mac Studio M4 Ultra with 128GB via heavy swap usage. Quality loss from Q3 is noticeable but acceptable for most tasks.
What is the best Mac configuration for running both models?
A Mac Studio M4 Ultra with 128GB unified memory can run both DeepSeek-V3 (Q3) and Qwen 3.5-122B (Q4). For the best experience with DeepSeek-V3, 192GB is ideal. Use modelfit.io to get personalized recommendations.
---
Tested February 2025 on macOS 15.3 with llama.cpp b4589. Results may vary with quantization and hardware.Have questions? Reach out on X/Twitter