Best Local LLMs for 256GB RAM

Ranked open-weight models that run well on a 256GB machine, with estimated speed and the exact ollama command for each.

Try: RTX 4090, MacBook Pro M4, 16GB, iPhone 17 Pro

Best local models for ~256GB

12 picks

Estimates assume a representative Apple-Silicon machine with 256GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.

Qwen3 235B A22B

Qwen / 235B / Q4_K_M / ~130 GB

Best for: Quality, Reasoning·Perf: ~9.2 tok/s · first token ~2.3s

Runs well

Best for quality, reasoning. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3:235b-a22b-q4_K_M

GPT-OSS 120B

GPT-OSS / 117B / MXFP4 / ~65.4 GB

Best for: Reasoning, Coding, Agents·Perf: ~28.6 tok/s · first token ~1.6s

Perfect fit

Best for reasoning, coding, agents. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run gpt-oss:120b

Qwen3.5 122B-A10B Instruct

Qwen / 122B / Q4_K_M / ~72 GB

Best for: Frontier-level reasoning, Complex tasks·Perf: ~18.9 tok/s · first token ~1.8s

Perfect fit

Best for frontier-level reasoning, complex tasks. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.5:122b-a10b

Llama 4 Scout

Llama / 109B / Q4_K_M / ~67 GB

Best for: Long context, Quality, Multimodal·Perf: ~15.4 tok/s · first token ~1.9s

Perfect fit

Best for long context, quality, multimodal. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run llama4:scout

Qwen3-Next 80B-A3B (Q8)

Qwen / 80B / Q8_0 / ~84.8 GB

Best for: Chat, Coding, Long Context·Perf: ~23.4 tok/s · first token ~1.7s

Perfect fit

Best for chat, coding, long context. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3-next:80b-a3b-instruct-q8_0

Qwen3-Next 80B-A3B

Qwen / 80B / Q4_K_M / ~50.4 GB

Best for: Chat, Coding, Long Context·Perf: ~42.7 tok/s · first token ~1.5s

Perfect fit

Best for chat, coding, long context. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3-next:80b

Qwen3.6 35B-A3B (Q8)

Qwen / 35B / Q8_0 / ~38.7 GB

Best for: Reasoning, Coding, Agents·Perf: ~35.3 tok/s · first token ~1.5s

Perfect fit

Best for reasoning, coding, agents. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.6:35b-a3b-q8_0

Qwen3.5 35B-A3B Instruct (Q8)

Qwen / 35B / Q8_0 / ~38.7 GB

Best for: Reasoning, Coding, Agent scenarios·Perf: ~35.3 tok/s · first token ~1.5s

Perfect fit

Best for reasoning, coding, agent scenarios. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.5:35b-a3b-q8_0

Qwen3.6 35B-A3B

Qwen / 35B / Q4_K_M / ~22 GB

Best for: Reasoning, Coding, Agents·Perf: ~64.6 tok/s · first token ~1.4s

Perfect fit

Best for reasoning, coding, agents. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.6:35b-a3b

Qwen3.5 35B-A3B Instruct

Qwen / 35B / Q4_K_M / ~20 GB

Best for: Reasoning, Coding, Agent scenarios·Perf: ~64.6 tok/s · first token ~1.4s

Perfect fit

Best for reasoning, coding, agent scenarios. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.5:35b-a3b

Gemma 4 26B-A4B

Gemma / 26B / Q4_K_M / ~16 GB

Best for: Chat, Coding, Multimodal·Perf: ~64.9 tok/s · first token ~0.6s

Perfect fit

Best for chat, coding, multimodal. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run gemma4:26b

Qwen3.5 4B Instruct

Qwen / 4B / Q4_K_M / ~3.5 GB

Best for: Coding, Agents, Multimodal·Perf: ~165.4 tok/s · first token ~0.5s

Perfect fit

Best for coding, agents, multimodal. Strong fit for 256 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.5:4b

Browse by RAM tier

8 GB 16 GB 24 GB 32 GB 48 GB 64 GB 96 GB 128 GB

Want an exact recommendation?

The wizard tunes picks and speed estimates to your exact device, chip, and RAM.

Open ModelFit Wizard Browse all devices