Best Local LLMs for 24GB RAM

Ranked open-weight models that run well on a 24GB machine — with estimated speed and the exact ollama command for each.

Best local models for ~24GB

12 picks

Estimates assume a representative Apple-Silicon machine with 24GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.

01

Qwen3.5 9B Instruct

Qwen / 9B / Q4_K_M / ~7 GB

Best for: Quality, Coding, Reasoning·Perf: ~102.7 tok/s (est.) · first token ~0.5s

Runs well

Best for quality, coding, reasoning. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.5:9b
02

Qwen3 8B

Qwen / 8B / Q4_K_M / ~6.5 GB

Best for: Chat, Coding·Perf: ~114.2 tok/s (est.) · first token ~0.5s

Perfect fit

Best for chat, coding. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3:8b-q4_K_M
03

LFM2.5 8B-A1B

LFM2 / 8.3B / Q4_K_M / ~5.5 GB

Best for: On-device agents, tool calling, multilingual chat·Perf: ~110.5 tok/s (est.) · first token ~0.5s

Perfect fit

Best for on-device agents, tool calling, multilingual chat. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run lfm2.5:8b-a1b-q4_K_M
04

Llama 3.1 8B Instruct

Llama / 8B / Q4_K_M / ~6.5 GB

Best for: Chat, Coding·Perf: ~114.2 tok/s (est.) · first token ~0.5s

Perfect fit

Best for chat, coding. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run llama3.1:8b-instruct-q4_K_M
05

Gemma 4 12B

Gemma / 12B / Q4_K_M / ~8 GB

Best for: Chat, Coding, Multimodal·Perf: ~79.3 tok/s (est.) · first token ~0.6s

Runs well

Best for chat, coding, multimodal. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run gemma4:12b
06

Qwen3 14B

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding, Quality·Perf: ~69 tok/s (est.) · first token ~0.6s

Runs well

Best for coding, quality. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3:14b-q4_K_M
07

Qwen2.5 Coder 7B

Qwen / 7B / Q4_K_M / ~5.5 GB

Best for: Coding·Perf: ~128.8 tok/s (est.) · first token ~0.5s

Perfect fit

Best for coding. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run qwen2.5-coder:7b
08

DeepSeek-R1 Distill Qwen 7B

DeepSeek / 7B / Q4_K_M / ~5.5 GB

Best for: Reasoning, Coding·Perf: ~128.8 tok/s (est.) · first token ~0.5s

Perfect fit

Best for reasoning, coding. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run deepseek-r1:7b
09

Mistral Nemo 12B

Mistral / 12B / Q4_K_M / ~9.5 GB

Best for: Chat, Translation·Perf: ~79.3 tok/s (est.) · first token ~0.6s

Runs well

Best for chat, translation. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run mistral-nemo:12b
10

Gemma 3 12B Instruct

Gemma / 12B / Q4_K_M / ~9.5 GB

Best for: Chat, Quality·Perf: ~79.3 tok/s (est.) · first token ~0.6s

Runs well

Best for chat, quality. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run gemma3:12b
11

Mistral 7B Instruct

Mistral / 7B / Q4_K_M / ~5.5 GB

Best for: Chat, Coding·Perf: ~128.8 tok/s (est.) · first token ~0.5s

Perfect fit

Best for chat, coding. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run mistral:7b-instruct-q4_K_M
12

Qwen2.5 7B Instruct

Qwen / 7B / Q4_K_M / ~5.5 GB

Best for: Chat, Coding·Perf: ~128.8 tok/s (est.) · first token ~0.5s

Perfect fit

Best for chat, coding. Strong fit for 24 GB RAM with balanced speed and quality.

ollama
$ollama run qwen2.5:7b-instruct-q4_K_M

Want an exact recommendation?

The wizard tunes picks and speed estimates to your exact device, chip, and RAM.