Llama 4 Scout
Llama / 109B / Q4_K_M / ~67 GB
Best for: Long context, Quality, Multimodal·Perf: ~22.7 tok/s (est.) · first token ~1.7s
Best for long context, quality, multimodal. Strong fit for 192 GB RAM with balanced speed and quality.
Ranked open-weight models that run well on a 192GB machine — with estimated speed and the exact ollama command for each.
Try: RTX 4090, MacBook Pro M4, 16GB, iPhone 17 Pro
Estimates assume a representative Apple-Silicon machine with 192GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.
Llama / 109B / Q4_K_M / ~67 GB
Best for: Long context, Quality, Multimodal·Perf: ~22.7 tok/s (est.) · first token ~1.7s
Best for long context, quality, multimodal. Strong fit for 192 GB RAM with balanced speed and quality.
Qwen / 122B / Q4_K_M / ~72 GB
Best for: Frontier-level reasoning, Complex tasks·Perf: ~20.5 tok/s (est.) · first token ~1.7s
Best for frontier-level reasoning, complex tasks. Strong fit for 192 GB RAM with balanced speed and quality.
Llama / 70B / Q4_K_M / ~42 GB
Best for: Quality, Coding·Perf: ~33.9 tok/s (est.) · first token ~1.5s
Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.
Llama / 70B / Q4_K_M / ~42 GB
Best for: Quality, Coding·Perf: ~33.9 tok/s (est.) · first token ~1.5s
Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.
DeepSeek / 70B / Q4_K_M / ~42 GB
Best for: Reasoning, Quality·Perf: ~33.9 tok/s (est.) · first token ~1.5s
Best for reasoning, quality. Strong fit for 192 GB RAM with balanced speed and quality.
Qwen / 35B / Q4_K_M / ~22 GB
Best for: Reasoning, Coding, Agents·Perf: ~63.2 tok/s (est.) · first token ~1.4s
Best for reasoning, coding, agents. Strong fit for 192 GB RAM with balanced speed and quality.
Qwen / 30B / Q4_K_M / ~22 GB
Best for: Quality, Coding·Perf: ~72.6 tok/s (est.) · first token ~1.4s
Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.
Gemma / 27B / Q4_K_M / ~21 GB
Best for: Quality, Coding·Perf: ~79.8 tok/s (est.) · first token ~0.6s
Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.
Mistral / 46.7B / Q4_K_M / ~30 GB
Best for: Coding, Quality·Perf: ~48.8 tok/s (est.) · first token ~1.5s
Best for coding, quality. Strong fit for 192 GB RAM with balanced speed and quality.
Qwen / 235B / Q4_K_M / ~130 GB
Best for: Quality, Reasoning·Perf: ~9.2 tok/s (est.) · first token ~2.3s
This model may feel memory-heavy on 192 GB RAM, but it is still listed for balanced speed and quality.
Gemma / 27B / Q4_K_M / ~21 GB
Best for: Quality, Coding·Perf: ~79.8 tok/s (est.) · first token ~0.6s
Best for quality, coding. Strong fit for 192 GB RAM with balanced speed and quality.
Qwen / 35B / Q4_K_M / ~20 GB
Best for: Reasoning, Coding, Agent scenarios·Perf: ~63.2 tok/s (est.) · first token ~1.4s
Best for reasoning, coding, agent scenarios. Strong fit for 192 GB RAM with balanced speed and quality.
The wizard tunes picks and speed estimates to your exact device, chip, and RAM.