Best Local LLMs for 48GB RAM

Ranked open-weight models that run well on a 48GB machine, with estimated speed and the exact ollama command for each.

Try: RTX 4090, MacBook Pro M4, 16GB, iPhone 17 Pro

Best local models for ~48GB

12 picks

Estimates assume a representative Apple-Silicon machine with 48GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.

Qwen3.5 35B-A3B Instruct

Qwen / 35B / Q4_K_M / ~20 GB

Best for: Reasoning, Coding, Agent scenarios·Perf: ~66.3 tok/s · first token ~1.4s

Runs well

Best for reasoning, coding, agent scenarios. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.5:35b-a3b

Qwen3.6 35B-A3B

Qwen / 35B / Q4_K_M / ~22 GB

Best for: Reasoning, Coding, Agents·Perf: ~66.3 tok/s · first token ~1.4s

Runs well

Best for reasoning, coding, agents. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.6:35b-a3b

Qwen3.6 27B

Qwen / 27B / Q4_K_M / ~18 GB

Best for: Coding, Quality, Long context·Perf: ~25.1 tok/s · first token ~0.8s

Runs well

Best for coding, quality, long context. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.6:27b

Laguna XS 2.1

Laguna / 33B / Q4_K_M / ~20.3 GB

Best for: Agentic coding, Long-horizon tasks·Perf: ~68.2 tok/s · first token ~1.4s

Runs well

Best for agentic coding, long-horizon tasks. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run laguna-xs-2.1:q4_K_M

Ornith 1.0 35B

Ornith / 35B / Q4_K_M / ~21.2 GB

Best for: Agentic coding·Perf: ~19.4 tok/s · first token ~1.8s

Runs well

Best for agentic coding. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run ornith:35b

Qwen3 30B

Qwen / 30B / Q4_K_M / ~22 GB

Best for: Quality, Coding·Perf: ~71.6 tok/s · first token ~1.4s

Runs well

Best for quality, coding. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3:30b

Gemma 4 31B

Gemma / 31B / Q4_K_M / ~20 GB

Best for: Quality, Coding, Multimodal·Perf: ~21.9 tok/s · first token ~1.7s

Runs well

Best for quality, coding, multimodal. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run gemma4:31b

Gemma 4 26B-A4B (Q8)

Gemma / 26B / Q8_0 / ~28.1 GB

Best for: Chat, Coding, Multimodal·Perf: ~36.4 tok/s · first token ~0.7s

Runs well

This model may feel memory-heavy on 48 GB RAM, but it is still listed for balanced speed and quality.

ollama

$ollama run gemma4:26b-a4b-it-q8_0

Gemma 4 26B-A4B

Gemma / 26B / Q4_K_M / ~16 GB

Best for: Chat, Coding, Multimodal·Perf: ~66.6 tok/s · first token ~0.6s

Runs well

Best for chat, coding, multimodal. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run gemma4:26b

Qwen3.5 27B Instruct

Qwen / 27B / Q4_K_M / ~16 GB

Best for: Chat, Coding, Complex reasoning·Perf: ~25.1 tok/s · first token ~0.8s

Runs well

Best for chat, coding, complex reasoning. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run qwen3.5:27b

GPT-OSS 20B

GPT-OSS / 21B / MXFP4 / ~13.8 GB

Best for: Chat, Coding, Reasoning·Perf: ~82.4 tok/s · first token ~0.6s

Perfect fit

Best for chat, coding, reasoning. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run gpt-oss:20b

Gemma 3 27B Instruct

Gemma / 27B / Q4_K_M / ~21 GB

Best for: Quality, Coding·Perf: ~25.1 tok/s · first token ~0.8s

Runs well

Best for quality, coding. Strong fit for 48 GB RAM with balanced speed and quality.

ollama

$ollama run gemma3:27b

Browse by RAM tier

8 GB 16 GB 24 GB 32 GB 48 GB 64 GB 96 GB 128 GB

Want an exact recommendation?

The wizard tunes picks and speed estimates to your exact device, chip, and RAM.

Open ModelFit Wizard Browse all devices