Qwen3.5 35B-A3B Instruct
Qwen / 35B / Q4_K_M / ~20 GB
Best for: Reasoning, Coding, Agent scenarios·Perf: ~128.3 tok/s · first token ~1.3s
This model may feel memory-heavy on 36 GB RAM, but it is still listed for balanced speed and quality.
Ranked open-weight models that run well on a 36GB machine, with estimated speed and the exact ollama command for each.
Try: RTX 4090, MacBook Pro M4, 16GB, iPhone 17 Pro
Estimates assume a representative Apple-Silicon machine with 36GB unified memory. Tok/s are ModelFit estimates, not measured benchmarks. Run the wizard for figures tuned to your exact chip.
Qwen / 35B / Q4_K_M / ~20 GB
Best for: Reasoning, Coding, Agent scenarios·Perf: ~128.3 tok/s · first token ~1.3s
This model may feel memory-heavy on 36 GB RAM, but it is still listed for balanced speed and quality.
Gemma / 26B / Q4_K_M / ~16 GB
Best for: Chat, Coding, Multimodal·Perf: ~129.5 tok/s · first token ~0.5s
Best for chat, coding, multimodal. Strong fit for 36 GB RAM with balanced speed and quality.
Qwen / 27B / Q4_K_M / ~16 GB
Best for: Chat, Coding, Complex reasoning·Perf: ~53.9 tok/s · first token ~0.6s
Best for chat, coding, complex reasoning. Strong fit for 36 GB RAM with balanced speed and quality.
Qwen / 27B / Q4_K_M / ~18 GB
Best for: Coding, Quality, Long context·Perf: ~53.9 tok/s · first token ~0.6s
Best for coding, quality, long context. Strong fit for 36 GB RAM with balanced speed and quality.
GPT-OSS / 21B / MXFP4 / ~13.8 GB
Best for: Chat, Coding, Reasoning·Perf: ~134.5 tok/s · first token ~0.5s
Best for chat, coding, reasoning. Strong fit for 36 GB RAM with balanced speed and quality.
LFM2 / 24B / Q4_K_M / ~14 GB
Best for: Local AI agents, privacy-first tool calling, MCP workflows·Perf: ~180 tok/s · first token ~0.5s
Best for local ai agents, privacy-first tool calling, mcp workflows. Strong fit for 36 GB RAM with balanced speed and quality.
Qwen / 35B / Q4_K_M / ~22 GB
Best for: Reasoning, Coding, Agents·Perf: ~116.6 tok/s · first token ~1.3s
This model may feel memory-heavy on 36 GB RAM, but it is still listed for balanced speed and quality.
Gemma / 31B / Q4_K_M / ~20 GB
Best for: Quality, Coding, Multimodal·Perf: ~47.4 tok/s · first token ~1.5s
This model may feel memory-heavy on 36 GB RAM, but it is still listed for balanced speed and quality.
Gemma / 12B / Q8_0 / ~12.8 GB
Best for: Chat, Coding, Multimodal·Perf: ~69.4 tok/s · first token ~0.6s
Best for chat, coding, multimodal. Strong fit for 36 GB RAM with balanced speed and quality.
Gemma / 27B / Q4_K_M / ~21 GB
Best for: Quality, Coding·Perf: ~51.1 tok/s · first token ~0.6s
This model may feel memory-heavy on 36 GB RAM, but it is still listed for balanced speed and quality.
Qwen / 30B / Q4_K_M / ~22 GB
Best for: Quality, Coding·Perf: ~125 tok/s · first token ~1.3s
This model may feel memory-heavy on 36 GB RAM, but it is still listed for balanced speed and quality.
Mistral / 24B / Q4_K_M / ~15 GB
Best for: Chat, Coding·Perf: ~59.9 tok/s · first token ~0.6s
Best for chat, coding. Strong fit for 36 GB RAM with balanced speed and quality.
The wizard tunes picks and speed estimates to your exact device, chip, and RAM.