Best Local LLMs for 192GB RAM

Ranked open-weight models that run well on a 192GB machine — with estimated speed and the exact ollama command for each.

Best local models for ~192GB

12 picks

Estimates assume a representative Apple-Silicon machine with 192GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.

01

Llama 4 Scout

Llama / 109B / Q4_K_M / ~67 GB

Best for: Long context, Quality, Multimodal·Perf: ~22.7 tok/s (est.) · first token ~1.7s

Runs well

Best for long context, quality, multimodal. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run llama4:scout
02

Qwen3.5 122B-A10B Instruct

Qwen / 122B / Q4_K_M / ~72 GB

Best for: Frontier-level reasoning, Complex tasks·Perf: ~20.5 tok/s (est.) · first token ~1.7s

Runs well

Best for frontier-level reasoning, complex tasks. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.5:122b-a10b
03

Llama 3.3 70B Instruct

Llama / 70B / Q4_K_M / ~42 GB

Best for: Quality, Coding·Perf: ~33.9 tok/s (est.) · first token ~1.5s

Perfect fit

Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run llama3.3:70b-instruct-q4_K_M
04

Llama 3.1 70B Instruct

Llama / 70B / Q4_K_M / ~42 GB

Best for: Quality, Coding·Perf: ~33.9 tok/s (est.) · first token ~1.5s

Perfect fit

Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run llama3.1:70b-instruct-q4_K_M
05

DeepSeek-R1 Distill Llama 70B

DeepSeek / 70B / Q4_K_M / ~42 GB

Best for: Reasoning, Quality·Perf: ~33.9 tok/s (est.) · first token ~1.5s

Perfect fit

Best for reasoning, quality. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run deepseek-r1:70b
06

Qwen3.6 35B-A3B

Qwen / 35B / Q4_K_M / ~22 GB

Best for: Reasoning, Coding, Agents·Perf: ~63.2 tok/s (est.) · first token ~1.4s

Perfect fit

Best for reasoning, coding, agents. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.6:35b-a3b
07

Qwen3 30B

Qwen / 30B / Q4_K_M / ~22 GB

Best for: Quality, Coding·Perf: ~72.6 tok/s (est.) · first token ~1.4s

Perfect fit

Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3:30b
08

Gemma 3 27B Instruct

Gemma / 27B / Q4_K_M / ~21 GB

Best for: Quality, Coding·Perf: ~79.8 tok/s (est.) · first token ~0.6s

Perfect fit

Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run gemma3:27b
09

Mixtral 8x7B Instruct

Mistral / 46.7B / Q4_K_M / ~30 GB

Best for: Coding, Quality·Perf: ~48.8 tok/s (est.) · first token ~1.5s

Perfect fit

Best for coding, quality. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run mixtral:8x7b
10

Qwen3 235B A22B

Qwen / 235B / Q4_K_M / ~130 GB

Best for: Quality, Reasoning·Perf: ~9.2 tok/s (est.) · first token ~2.3s

Runs well

This model may feel memory-heavy on 192 GB RAM, but it is still listed for balanced speed and quality.

ollama
$ollama run qwen3:235b-a22b-q4_K_M
11

Gemma 2 27B Instruct

Gemma / 27B / Q4_K_M / ~21 GB

Best for: Quality, Coding·Perf: ~79.8 tok/s (est.) · first token ~0.6s

Perfect fit

Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run gemma2:27b-instruct-q4_K_M
12

Qwen3.5 35B-A3B Instruct

Qwen / 35B / Q4_K_M / ~20 GB

Best for: Reasoning, Coding, Agent scenarios·Perf: ~63.2 tok/s (est.) · first token ~1.4s

Perfect fit

Best for reasoning, coding, agent scenarios. Strong fit for 192 GB RAM with balanced speed and quality.

ollama
$ollama run qwen3.5:35b-a3b

Want an exact recommendation?

The wizard tunes picks and speed estimates to your exact device, chip, and RAM.