Best Local LLMs for 48GB RAM

Ranked open-weight models that run well on a 48GB machine — with estimated speed and the exact ollama command for each.

Best local models for ~48GB

12 picks

Estimates assume a representative Apple-Silicon machine with 48GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.

01

Qwen3.6 35B-A3B

Qwen / 35B / Q4_K_M / ~22 GB

Best for: Reasoning, Coding, Agents·Perf: ~42.7 tok/s (est.) · first token ~1.5s

Runs well

Best for reasoning, coding, agents. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.6:35b-a3b
02

Qwen3.5 35B-A3B Instruct

Qwen / 35B / Q4_K_M / ~20 GB

Best for: Reasoning, Coding, Agent scenarios·Perf: ~42.7 tok/s (est.) · first token ~1.5s

Runs well

Best for reasoning, coding, agent scenarios. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.5:35b-a3b
03

Qwen3.5 27B Instruct

Qwen / 27B / Q4_K_M / ~16 GB

Best for: Chat, Coding, Complex reasoning·Perf: ~53.9 tok/s (est.) · first token ~0.6s

Runs well

Best for chat, coding, complex reasoning. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.5:27b
04

Qwen3.6 27B

Qwen / 27B / Q4_K_M / ~18 GB

Best for: Coding, Quality, Long context·Perf: ~53.9 tok/s (est.) · first token ~0.6s

Runs well

Best for coding, quality, long context. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.6:27b
05

Gemma 4 26B-A4B

Gemma / 26B / Q4_K_M / ~16 GB

Best for: Chat, Coding, Multimodal·Perf: ~55.8 tok/s (est.) · first token ~0.6s

Runs well

Best for chat, coding, multimodal. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run gemma4:26b
06

LFM2 24B-A2B Instruct

LFM2 / 24B / Q4_K_M / ~14 GB

Best for: Local AI agents, privacy-first tool calling, MCP workflows·Perf: ~59.9 tok/s (est.) · first token ~0.6s

Runs well

Best for local ai agents, privacy-first tool calling, mcp workflows. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run lfm2:24b-a2b
07

Qwen3 14B

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding, Quality·Perf: ~97.4 tok/s (est.) · first token ~0.6s

Perfect fit

Best for coding, quality. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3:14b-q4_K_M
08

Gemma 4 31B

Gemma / 31B / Q4_K_M / ~20 GB

Best for: Quality, Coding, Multimodal·Perf: ~47.6 tok/s (est.) · first token ~1.5s

Runs well

Best for quality, coding, multimodal. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run gemma4:31b
09

Qwen2.5 Coder 14B

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding·Perf: ~97.4 tok/s (est.) · first token ~0.6s

Perfect fit

Best for coding. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run qwen2.5-coder:14b
10

Qwen3 30B

Qwen / 30B / Q4_K_M / ~22 GB

Best for: Quality, Coding·Perf: ~49 tok/s (est.) · first token ~1.5s

Runs well

Best for quality, coding. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3:30b
11

Qwen2.5 14B Instruct

Qwen / 14B / Q4_K_M / ~11 GB

Best for: Coding, Chat·Perf: ~97.4 tok/s (est.) · first token ~0.6s

Perfect fit

Best for coding, chat. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run qwen2.5:14b-instruct-q4_K_M
12

DeepSeek-R1 Distill Qwen 14B

DeepSeek / 14B / Q4_K_M / ~11 GB

Best for: Reasoning, Quality·Perf: ~97.4 tok/s (est.) · first token ~0.6s

Perfect fit

Best for reasoning, quality. Strong fit for 48 GB RAM with balanced speed and quality.

ollama
$ollama run deepseek-r1:14b

Want an exact recommendation?

The wizard tunes picks and speed estimates to your exact device, chip, and RAM.