Mistral Nemo 12B
Mistral / 12B / Q4_K_M / ~9.5 GB
Best for: Chat, Translation·Pop: 78/100
Perf: ~46.0 tok/s · first token ~0.7s
Best for chat, translation. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run mistral-nemo:12b-q4_K_M
MacBook Pro with Apple M4 and 32GB RAM can dedicate about 22GB to AI inference. For translation tasks, Mistral Nemo 12B is the top pick — it fits comfortably in memory and delivers strong translation performance. Below are all translation models ranked for your hardware.
Mistral / 12B / Q4_K_M / ~9.5 GB
Best for: Chat, Translation·Pop: 78/100
Perf: ~46.0 tok/s · first token ~0.7s
Best for chat, translation. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run mistral-nemo:12b-q4_K_M
Mistral / 7B / Q4_K_M / ~5.5 GB
Best for: Chat, Coding·Pop: 90/100
Perf: ~74.8 tok/s · first token ~0.6s
Best for chat, coding. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run mistral:7b-instruct-q4_K_M
Qwen / 2B / Q4_K_M / ~1.8 GB
Best for: Chat, Edge tasks·Pop: 75/100
Perf: ~180.0 tok/s · first token ~0.5s
Best for chat, edge tasks. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run qwen3.5:2b-instruct-q4_K_M
Qwen / 3B / Q4_K_M / ~2.5 GB
Best for: Chat, Coding·Pop: 74/100
Perf: ~160.2 tok/s · first token ~0.5s
Best for chat, coding. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run qwen2.5:3b-instruct-q4_K_M
Gemma / 2B / Q4_K_M / ~1.8 GB
Best for: Chat·Pop: 73/100
Perf: ~180.0 tok/s · first token ~0.5s
Best for chat. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run gemma2:2b-instruct-q4_K_M
Gemma / 1B / Q4_K_M / ~1 GB
Best for: Chat, Mobile·Pop: 78/100
Perf: ~180.0 tok/s · first token ~0.5s
Best for chat, mobile. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run gemma3:1b-instruct-q4_K_M
Qwen / 1.5B / Q4_K_M / ~1.5 GB
Best for: Chat, Translation·Pop: 66/100
Perf: ~180.0 tok/s · first token ~0.5s
Best for chat, translation. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run qwen2.5:1.5b-instruct-q4_K_M
Qwen / 0.8B / Q4_K_M / ~0.8 GB
Best for: Chat, Mobile·Pop: 70/100
Perf: ~180.0 tok/s · first token ~0.5s
Best for chat, mobile. Strong fit for 32 GB RAM with balanced speed and quality.
ollama run qwen3.5:0.8b-instruct-q4_K_M
Use the ModelFit wizard to test different RAM and chip configurations for your exact MacBook Pro setup.
Open ModelFit Wizard →