Qwen3 235B A22B
Qwen / 235B / Q4_K_M / ~130 GB
Best for: Quality, Reasoning·Perf: ~11.4 tok/s (est.) · first token ~2.1s
Best for quality, reasoning. Strong fit for 512 GB RAM with balanced speed and quality.
Ranked open-weight models that run well on a 512GB machine — with estimated speed and the exact ollama command for each.
Try: RTX 4090, MacBook Pro M4, 16GB, iPhone 17 Pro
Estimates assume a representative Apple-Silicon machine with 512GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.
Qwen / 235B / Q4_K_M / ~130 GB
Best for: Quality, Reasoning·Perf: ~11.4 tok/s (est.) · first token ~2.1s
Best for quality, reasoning. Strong fit for 512 GB RAM with balanced speed and quality.
Llama / 400B / Q4_K_M / ~245 GB
Best for: Frontier quality, Long context·Perf: ~7.1 tok/s (est.) · first token ~2.7s
Best for frontier quality, long context. Strong fit for 512 GB RAM with balanced speed and quality.
Llama / 405B / Q4_K_M / ~243 GB
Best for: Quality, Reasoning, Coding·Perf: ~7 tok/s (est.) · first token ~2.7s
Best for quality, reasoning, coding. Strong fit for 512 GB RAM with balanced speed and quality.
Llama / 109B / Q4_K_M / ~67 GB
Best for: Long context, Quality, Multimodal·Perf: ~22.7 tok/s (est.) · first token ~1.7s
Best for long context, quality, multimodal. Strong fit for 512 GB RAM with balanced speed and quality.
Qwen / 122B / Q4_K_M / ~72 GB
Best for: Frontier-level reasoning, Complex tasks·Perf: ~20.5 tok/s (est.) · first token ~1.7s
Best for frontier-level reasoning, complex tasks. Strong fit for 512 GB RAM with balanced speed and quality.
Qwen / 35B / Q4_K_M / ~22 GB
Best for: Reasoning, Coding, Agents·Perf: ~63.2 tok/s (est.) · first token ~1.4s
Best for reasoning, coding, agents. Strong fit for 512 GB RAM with balanced speed and quality.
Qwen / 35B / Q4_K_M / ~20 GB
Best for: Reasoning, Coding, Agent scenarios·Perf: ~63.2 tok/s (est.) · first token ~1.4s
Best for reasoning, coding, agent scenarios. Strong fit for 512 GB RAM with balanced speed and quality.
Qwen / 4B / Q4_K_M / ~3.5 GB
Best for: Coding, Agents, Multimodal·Perf: ~180 tok/s (est.) · first token ~0.5s
Best for coding, agents, multimodal. Strong fit for 512 GB RAM with balanced speed and quality.
Qwen / 9B / Q4_K_M / ~7 GB
Best for: Quality, Coding, Reasoning·Perf: ~180 tok/s (est.) · first token ~0.5s
Best for quality, coding, reasoning. Strong fit for 512 GB RAM with balanced speed and quality.
Qwen / 27B / Q4_K_M / ~16 GB
Best for: Chat, Coding, Complex reasoning·Perf: ~79.8 tok/s (est.) · first token ~0.6s
Best for chat, coding, complex reasoning. Strong fit for 512 GB RAM with balanced speed and quality.
Qwen / 27B / Q4_K_M / ~18 GB
Best for: Coding, Quality, Long context·Perf: ~79.8 tok/s (est.) · first token ~0.6s
Best for coding, quality, long context. Strong fit for 512 GB RAM with balanced speed and quality.
Gemma / 26B / Q4_K_M / ~16 GB
Best for: Chat, Coding, Multimodal·Perf: ~82.6 tok/s (est.) · first token ~0.6s
Best for chat, coding, multimodal. Strong fit for 512 GB RAM with balanced speed and quality.
The wizard tunes picks and speed estimates to your exact device, chip, and RAM.