Best Local LLMs for 24GB RAM

Ranked open-weight models that run well on a 24GB machine, with estimated speed and the exact ollama command for each.

Try: RTX 4090, MacBook Pro M4, 16GB, iPhone 17 Pro

Best local models for ~24GB

12 picks

Estimates assume a representative Apple-Silicon machine with 24GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.

GPT-OSS 20B

GPT-OSS / 21B / MXFP4 / ~13.8 GB

Best for: Chat, Coding, Reasoning·Perf: ~48.4 tok/s · first token ~0.7s

Runs well

This model may feel memory-heavy on 24 GB RAM, but it is still listed for balanced speed and quality.

ollama

$ollama run gpt-oss:20b

LFM2 24B-A2B Instruct

LFM2 / 24B / Q4_K_M / ~14 GB

Best for: Local AI agents, privacy-first tool calling, MCP workflows·Perf: ~57.6 tok/s · first token ~0.6s

Runs well

This model may feel memory-heavy on 24 GB RAM, but it is still listed for balanced speed and quality.

ollama

$ollama run lfm2:24b-a2b

Qwen3 14B

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding, Quality·Perf: ~28.5 tok/s · first token ~0.8s

Runs well

Best for coding, quality. Strong fit for 24 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3:14b-q4_K_M

Gemma 3 12B Instruct

Gemma / 12B / Q4_K_M / ~9.5 GB

Best for: Chat, Quality·Perf: ~33.3 tok/s · first token ~0.8s

Runs well

Best for chat, quality. Strong fit for 24 GB RAM with balanced speed and quality.

ollama

$ollama run gemma3:12b

Gemma 4 26B-A4B

Gemma / 26B / Q4_K_M / ~16 GB

Best for: Chat, Coding, Multimodal·Perf: ~37 tok/s · first token ~0.7s

Runs well

This model may feel memory-heavy on 24 GB RAM, but it is still listed for balanced speed and quality.

ollama

$ollama run gemma4:26b

Mistral Nemo 12B

Mistral / 12B / Q4_K_M / ~9.5 GB

Best for: Chat, Translation·Perf: ~33.3 tok/s · first token ~0.8s

Runs well

Best for chat, translation. Strong fit for 24 GB RAM with balanced speed and quality.

ollama

$ollama run mistral-nemo:12b

Qwen3.5 9B Instruct (Q8)

Qwen / 9B / Q8_0 / ~10.7 GB

Best for: Quality, Coding, Reasoning·Perf: ~24.3 tok/s · first token ~0.9s

Runs well

Best for quality, coding, reasoning. Strong fit for 24 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.5:9b-q8_0

Qwen3.5 27B Instruct

Qwen / 27B / Q4_K_M / ~16 GB

Best for: Chat, Coding, Complex reasoning·Perf: ~14 tok/s · first token ~1.2s

Runs well

This model may feel memory-heavy on 24 GB RAM, but it is still listed for balanced speed and quality.

ollama

$ollama run qwen3.5:27b

Qwen3.5 9B Instruct

Qwen / 9B / Q4_K_M / ~7 GB

Best for: Quality, Coding, Reasoning·Perf: ~44.3 tok/s · first token ~0.7s

Runs well

Best for quality, coding, reasoning. Strong fit for 24 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.5:9b

Gemma 4 12B (Q8)

Gemma / 12B / Q8_0 / ~12.8 GB

Best for: Chat, Coding, Multimodal·Perf: ~18.2 tok/s · first token ~1.0s

Runs well

This model may feel memory-heavy on 24 GB RAM, but it is still listed for balanced speed and quality.

ollama

$ollama run gemma4:12b-it-q8_0

Qwen2.5 Coder 14B

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding·Perf: ~28.5 tok/s · first token ~0.8s

Runs well

Best for coding. Strong fit for 24 GB RAM with balanced speed and quality.

ollama

$ollama run qwen2.5-coder:14b

DeepSeek-R1 Distill Qwen 14B

DeepSeek / 14B / Q4_K_M / ~11 GB

Best for: Reasoning, Quality·Perf: ~28.5 tok/s · first token ~0.8s

Runs well

Best for reasoning, quality. Strong fit for 24 GB RAM with balanced speed and quality.

ollama

$ollama run deepseek-r1:14b

Browse by RAM tier

8 GB 16 GB 24 GB 32 GB 48 GB 64 GB 96 GB 128 GB

Want an exact recommendation?

The wizard tunes picks and speed estimates to your exact device, chip, and RAM.

Open ModelFit Wizard Browse all devices