Qwen3.5 4B Instruct
Qwen / 4B / Q4_K_M / ~3.5 GB
Best for: Coding, Agents, Multimodal·Perf: ~99 tok/s (est.) · first token ~0.6s
Best for coding, agents, multimodal. Strong fit for 16 GB RAM with balanced speed and quality.
Ranked open-weight models that run well on a 16GB machine — with estimated speed and the exact ollama command for each.
Try: RTX 4090, MacBook Pro M4, 16GB, iPhone 17 Pro
Estimates assume a representative Apple-Silicon machine with 16GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.
Qwen / 4B / Q4_K_M / ~3.5 GB
Best for: Coding, Agents, Multimodal·Perf: ~99 tok/s (est.) · first token ~0.6s
Best for coding, agents, multimodal. Strong fit for 16 GB RAM with balanced speed and quality.
Qwen / 9B / Q4_K_M / ~7 GB
Best for: Quality, Coding, Reasoning·Perf: ~47.7 tok/s (est.) · first token ~0.7s
Best for quality, coding, reasoning. Strong fit for 16 GB RAM with balanced speed and quality.
Qwen / 8B / Q4_K_M / ~6.5 GB
Best for: Chat, Coding·Perf: ~53 tok/s (est.) · first token ~0.6s
Best for chat, coding. Strong fit for 16 GB RAM with balanced speed and quality.
LFM2 / 8.3B / Q4_K_M / ~5.5 GB
Best for: On-device agents, tool calling, multilingual chat·Perf: ~51.3 tok/s (est.) · first token ~0.6s
Best for on-device agents, tool calling, multilingual chat. Strong fit for 16 GB RAM with balanced speed and quality.
Gemma / 4.5B / Q4_K_M / ~4 GB
Best for: On-device, Mobile, Chat·Perf: ~89 tok/s (est.) · first token ~0.6s
Best for on-device, mobile, chat. Strong fit for 16 GB RAM with balanced speed and quality.
Llama / 8B / Q4_K_M / ~6.5 GB
Best for: Chat, Coding·Perf: ~53 tok/s (est.) · first token ~0.6s
Best for chat, coding. Strong fit for 16 GB RAM with balanced speed and quality.
Gemma / 4B / Q4_K_M / ~3.5 GB
Best for: Chat, Coding·Perf: ~99 tok/s (est.) · first token ~0.6s
Best for chat, coding. Strong fit for 16 GB RAM with balanced speed and quality.
Qwen / 7B / Q4_K_M / ~5.5 GB
Best for: Coding·Perf: ~59.8 tok/s (est.) · first token ~0.6s
Best for coding. Strong fit for 16 GB RAM with balanced speed and quality.
DeepSeek / 7B / Q4_K_M / ~5.5 GB
Best for: Reasoning, Coding·Perf: ~59.8 tok/s (est.) · first token ~0.6s
Best for reasoning, coding. Strong fit for 16 GB RAM with balanced speed and quality.
Mistral / 7B / Q4_K_M / ~5.5 GB
Best for: Chat, Coding·Perf: ~59.8 tok/s (est.) · first token ~0.6s
Best for chat, coding. Strong fit for 16 GB RAM with balanced speed and quality.
Qwen / 7B / Q4_K_M / ~5.5 GB
Best for: Chat, Coding·Perf: ~59.8 tok/s (est.) · first token ~0.6s
Best for chat, coding. Strong fit for 16 GB RAM with balanced speed and quality.
Gemma / 9B / Q4_K_M / ~7 GB
Best for: Chat, Coding·Perf: ~47.7 tok/s (est.) · first token ~0.7s
Best for chat, coding. Strong fit for 16 GB RAM with balanced speed and quality.
The wizard tunes picks and speed estimates to your exact device, chip, and RAM.