Best Local LLMs for 8GB RAM

Ranked open-weight models that run well on a 8GB machine — with estimated speed and the exact ollama command for each.

Best local models for ~8GB

12 picks

Estimates assume a representative Apple-Silicon machine with 8GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.

01

Qwen3.5 4B Instruct

Qwen / 4B / Q4_K_M / ~3.5 GB

Best for: Coding, Agents, Multimodal·Perf: ~79.2 tok/s (est.) · first token ~0.6s

Runs well

Best for coding, agents, multimodal. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.5:4b
02

Gemma 3 4B Instruct

Gemma / 4B / Q4_K_M / ~3.5 GB

Best for: Chat, Coding·Perf: ~79.2 tok/s (est.) · first token ~0.6s

Runs well

Best for chat, coding. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run gemma3:4b
03

Gemma 4 E2B

Gemma / 2.3B / Q4_K_M / ~2.3 GB

Best for: IoT, Mobile, Edge·Perf: ~130.3 tok/s (est.) · first token ~0.5s

Runs well

Best for iot, mobile, edge. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run gemma4:e2b
04

Phi-4 Mini 3.8B

Phi / 3.8B / Q4_K_M / ~3.2 GB

Best for: Coding, Chat·Perf: ~82.9 tok/s (est.) · first token ~0.6s

Runs well

Best for coding, chat. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run phi4-mini:3.8b
05

Qwen3.5 2B Instruct

Qwen / 2B / Q4_K_M / ~1.8 GB

Best for: Chat, Edge tasks·Perf: ~147.7 tok/s (est.) · first token ~0.5s

Perfect fit

Best for chat, edge tasks. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.5:2b
06

Llama 3.2 3B Instruct

Llama / 3B / Q4_K_M / ~2.5 GB

Best for: Chat·Perf: ~102.6 tok/s (est.) · first token ~0.5s

Runs well

Best for chat. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run llama3.2:3b-instruct-q4_K_M
07

Phi-3 Mini 3.8B

Phi / 3.8B / Q4_K_M / ~3.2 GB

Best for: Coding, Chat·Perf: ~82.9 tok/s (est.) · first token ~0.6s

Runs well

Best for coding, chat. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run phi3:mini
08

Qwen2.5 3B Instruct

Qwen / 3B / Q4_K_M / ~2.5 GB

Best for: Chat, Coding·Perf: ~102.6 tok/s (est.) · first token ~0.5s

Runs well

Best for chat, coding. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run qwen2.5:3b-instruct-q4_K_M
09

Gemma 2 2B Instruct

Gemma / 2B / Q4_K_M / ~1.8 GB

Best for: Chat·Perf: ~147.7 tok/s (est.) · first token ~0.5s

Perfect fit

Best for chat. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run gemma2:2b-instruct-q4_K_M
10

Granite 4.1 3B Instruct

Granite / 3B / Q4_K_M / ~2 GB

Best for: Lightweight chat, classification, edge tasks·Perf: ~102.6 tok/s (est.) · first token ~0.5s

Perfect fit

Best for lightweight chat, classification, edge tasks. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run granite4.1:3b
11

Gemma 4 E4B

Gemma / 4.5B / Q4_K_M / ~4 GB

Best for: On-device, Mobile, Chat·Perf: ~71.2 tok/s (est.) · first token ~0.6s

Runs well

Best for on-device, mobile, chat. Strong fit for 8 GB RAM with balanced speed and quality.

ollama
$ollama run gemma4:e4b
12

LFM2.5 8B-A1B

LFM2 / 8.3B / Q4_K_M / ~5.5 GB

Best for: On-device agents, tool calling, multilingual chat·Perf: ~32.7 tok/s (est.) · first token ~0.8s

Runs well

This model may feel memory-heavy on 8 GB RAM, but it is still listed for balanced speed and quality.

ollama
$ollama run lfm2.5:8b-a1b-q4_K_M

Want an exact recommendation?

The wizard tunes picks and speed estimates to your exact device, chip, and RAM.