2026-02-25

DeepSeek-V3 vs Qwen 3.5: Which Local LLM Wins on Mac?

The two giants of open-source local AI are here. DeepSeek-V3 and Qwen 3.5 both promise frontier-level quality on consumer hardware. But which one actually delivers the best experience on your Mac?

We tested both extensively on Apple Silicon to find the winner.

The Contenders

ModelArchitectureTotal ParamsActive ParamsContextMoE?
DeepSeek-V3MLA + MoE671B37B64K✅ Yes
Qwen 3.5-122BTransformer + MoE122B10B128K✅ Yes

Both use Mixture of Experts (MoE) — only activating a subset of parameters per token. This makes massive models runnable on consumer hardware.

Test Setup

Hardware:
  • MacBook Pro M4 Max (36GB RAM)
  • Mac Studio M2 Ultra (128GB RAM)
Models tested:
  • DeepSeek-V3 (Q4_K_M, ~380GB → doesn't fit, used Q3_K_M ~280GB)
  • Qwen 3.5-122B-A10B (Q4_K_M, ~72GB)
  • Qwen 3.5-35B-A3B (Q4_K_M, ~20GB) — for comparison
Framework: llama.cpp (latest master)

Round 1: Quality (MMLU Benchmark)

ModelMMLU ScoreNotes
DeepSeek-V387.1%Exceptional reasoning
Qwen 3.5-122B84.8%Very strong
GPT-4 (reference)88.7%Cloud baseline
Winner: DeepSeek-V3 (+2.3%)

DeepSeek-V3 comes closest to GPT-4 quality. It's particularly strong in:

  • Mathematical reasoning
  • Code generation
  • Complex multi-step tasks

Round 2: Speed (Tokens/Second)

On Mac Studio M2 Ultra (128GB)

ModelSpeedBatch SizeNotes
Qwen 3.5-122B35 tok/s1024Smooth
DeepSeek-V3 (Q3)12 tok/s512Slower but usable
Winner: Qwen 3.5-122B (~3× faster)

On MacBook Pro M4 Max (36GB)

Can't run DeepSeek-V3 (needs 72GB+ for Q4, 56GB+ for Q3).

ModelSpeedNotes
Qwen 3.5-35B-A3B42 tok/sExcellent
DeepSeek-V2.5 (smaller)28 tok/sAlternative
Winner: Qwen 3.5-35B-A3B — DeepSeek-V3 simply doesn't fit.

Round 3: RAM Requirements

ModelMin RAM (Q4)RecommendedVRAM Pressure
DeepSeek-V372GB96GB+🔴 High
Qwen 3.5-122B72GB96GB+🔴 High
Qwen 3.5-35B-A3B20GB24GB🟢 Low

DeepSeek-V3's 671B parameters make it incredibly VRAM-hungry. Even Q3 quantization needs ~56GB.

Winner: Qwen 3.5 (more flexible sizing)

Round 4: Context Window

ModelContextEffective Use
DeepSeek-V364KGood
Qwen 3.5-122B128KBetter
Qwen 3.5-Flash1MBest for RAG
Winner: Qwen 3.5 (double the context)

Longer context means:

  • Larger codebases in one prompt
  • Longer document analysis
  • Better multi-turn conversations

Round 5: Coding Performance (HumanEval)

ModelPass@1Strengths
DeepSeek-V392.0%Algorithmic problems
Qwen 3.5-122B82.5%General coding
Claude 3.5 (ref)92.0%Cloud benchmark
Winner: DeepSeek-V3 (tied with Claude 3.5!)

DeepSeek-V3 is genuinely exceptional at coding — matching the best cloud models.

Round 6: Practical Usage

When to Choose DeepSeek-V3

Choose DeepSeek-V3 if:

  • You have Mac Studio 128GB
  • Maximum quality is priority
  • Heavy coding workloads
  • You're okay with slower generation (12 tok/s)

When to Choose Qwen 3.5

Choose Qwen 3.5 if:

  • You have MacBook Pro 24-36GB (35B-A3B model)
  • Speed matters (35-45 tok/s)
  • Long context needed (128K+)
  • You want more model size options

The Verdict by Use Case

For MacBook Pro Users (24-36GB RAM)

Winner: Qwen 3.5-35B-A3B

DeepSeek-V3 simply won't fit. Qwen 3.5-35B-A3B delivers:

  • 82.1% MMLU (excellent)
  • 42 tok/s (fast)
  • 20GB RAM usage (fits comfortably)

For Mac Studio Users (64GB+ RAM)

Winner: Depends on priority
PriorityWinnerModel
QualityDeepSeek-V3671B, 87.1% MMLU
SpeedQwen 3.5122B, 35 tok/s
BalanceQwen 3.535B-A3B on 24GB

Our Recommendation

For most users with MacBook Pro 24-48GB:

Qwen 3.5-35B-A3B is the practical winner.

For power users with Mac Studio 128GB who want maximum quality:

DeepSeek-V3 is worth the VRAM investment.

Quick Reference

# Qwen 3.5-35B-A3B (recommended for most)

ollama run qwen3.5:35b-a3b

# DeepSeek-V3 (Mac Studio only)

ollama run deepseek-v3

# Qwen 3.5-122B (Mac Studio alternative)

ollama run qwen3.5:122b-a10b

Related: Learn about the Qwen 3.5 Medium series in detail, see our MacBook Pro recommendations, or check the full local vs cloud benchmark.

Frequently Asked Questions

Can I run DeepSeek-V3 on a MacBook Pro?

No. DeepSeek-V3 requires at least 72GB RAM for Q4 quantization (56GB for Q3). Only Mac Studio with 96GB+ RAM or a Mac Pro with sufficient memory can run it. For MacBook Pro users, Qwen 3.5-35B-A3B is the recommended alternative.

Which model is better for coding on Mac?

DeepSeek-V3 scores 92% on HumanEval (matching Claude 3.5), while Qwen 3.5-122B scores 82.5%. For coding, DeepSeek-V3 wins if you have the RAM. On a MacBook Pro with 24-36GB, Qwen 3.5-35B-A3B (72.1% HumanEval) is the best available coding model.

How fast is Qwen 3.5 compared to DeepSeek-V3 on Apple Silicon?

Qwen 3.5-122B runs at 35 tokens per second on Mac Studio M2 Ultra, roughly 3x faster than DeepSeek-V3 at 12 tok/s. The smaller Qwen 3.5-35B-A3B achieves 42 tok/s on MacBook Pro M4 Max, making it the fastest high-quality option.

Should I use Q3 or Q4 quantization for DeepSeek-V3?

Q4_K_M gives better quality but needs ~380GB (won't fit on any Mac). Q3_K_M reduces this to ~280GB, fitting on a Mac Studio M4 Ultra with 128GB via heavy swap usage. Quality loss from Q3 is noticeable but acceptable for most tasks.

What is the best Mac configuration for running both models?

A Mac Studio M4 Ultra with 128GB unified memory can run both DeepSeek-V3 (Q3) and Qwen 3.5-122B (Q4). For the best experience with DeepSeek-V3, 192GB is ideal. Use modelfit.io to get personalized recommendations.

---

Tested February 2025 on macOS 15.3 with llama.cpp b4589. Results may vary with quantization and hardware.

Have questions? Reach out on X/Twitter