Best Local LLMs for 32GB RAM

Ranked open-weight models that run well on a 32GB machine — with estimated speed and the exact ollama command for each.

Best local models for ~32GB

12 picks

Estimates assume a representative Apple-Silicon machine with 32GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.

01

Qwen3.5 9B Instruct

Qwen / 9B / Q4_K_M / ~7 GB

Best for: Quality, Coding, Reasoning·Perf: ~119.2 tok/s (est.) · first token ~0.5s

Perfect fit

Best for quality, coding, reasoning. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.5:9b
02

LFM2 24B-A2B Instruct

LFM2 / 24B / Q4_K_M / ~14 GB

Best for: Local AI agents, privacy-first tool calling, MCP workflows·Perf: ~49.3 tok/s (est.) · first token ~0.7s

Runs well

Best for local ai agents, privacy-first tool calling, mcp workflows. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run lfm2:24b-a2b
03

Gemma 4 12B

Gemma / 12B / Q4_K_M / ~8 GB

Best for: Chat, Coding, Multimodal·Perf: ~92 tok/s (est.) · first token ~0.6s

Perfect fit

Best for chat, coding, multimodal. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run gemma4:12b
04

Qwen3 14B

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding, Quality·Perf: ~80.1 tok/s (est.) · first token ~0.6s

Runs well

Best for coding, quality. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3:14b-q4_K_M
05

Mistral Nemo 12B

Mistral / 12B / Q4_K_M / ~9.5 GB

Best for: Chat, Translation·Perf: ~92 tok/s (est.) · first token ~0.6s

Runs well

Best for chat, translation. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run mistral-nemo:12b
06

Gemma 3 12B Instruct

Gemma / 12B / Q4_K_M / ~9.5 GB

Best for: Chat, Quality·Perf: ~92 tok/s (est.) · first token ~0.6s

Runs well

Best for chat, quality. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run gemma3:12b
07

Gemma 2 9B Instruct

Gemma / 9B / Q4_K_M / ~7 GB

Best for: Chat, Coding·Perf: ~119.2 tok/s (est.) · first token ~0.5s

Perfect fit

Best for chat, coding. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run gemma2:9b-instruct-q4_K_M
08

Qwen2.5 Coder 14B

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding·Perf: ~80.1 tok/s (est.) · first token ~0.6s

Runs well

Best for coding. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run qwen2.5-coder:14b
09

Qwen2.5 14B Instruct

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding, Chat·Perf: ~80.1 tok/s (est.) · first token ~0.6s

Runs well

Best for coding, chat. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run qwen2.5:14b-instruct-q4_K_M
10

DeepSeek-R1 Distill Qwen 14B

DeepSeek / 14B / Q4_K_M / ~11 GB

Best for: Reasoning, Quality·Perf: ~80.1 tok/s (est.) · first token ~0.6s

Runs well

Best for reasoning, quality. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run deepseek-r1:14b
11

Phi-4 14B

Phi / 14B / Q4_K_M / ~11.5 GB

Best for: Coding, Quality·Perf: ~80.1 tok/s (est.) · first token ~0.6s

Runs well

Best for coding, quality. Strong fit for 32 GB RAM with balanced speed and quality.

ollama
$ollama run phi4:14b-q4_K_M
12

Qwen3.5 35B-A3B Instruct

Qwen / 35B / Q4_K_M / ~20 GB

Best for: Reasoning, Coding, Agent scenarios·Perf: ~30.8 tok/s (est.) · first token ~1.6s

Runs well

This model may feel memory-heavy on 32 GB RAM, but it is still listed for balanced speed and quality.

ollama
$ollama run qwen3.5:35b-a3b

Want an exact recommendation?

The wizard tunes picks and speed estimates to your exact device, chip, and RAM.