Qwen3.5 4B Instruct
Qwen / 4B / Q4_K_M / ~3.5 GB
Best for: Coding, Agents, Multimodal·Pop: 88/100
Perf: ~18.6 tok/s · first token ~1.0s
Best for coding, agents, multimodal. Strong fit for 8 GB RAM with balanced speed and quality.
The iPhone 16 Pro is the notebook in your pocket: a 4B model for capturing ideas, sketching dialogue, and unblocking a scene from wherever the idea strikes. Capture on the phone; compose on the Mac.
Qwen / 4B / Q4_K_M / ~3.5 GB
Best for: Coding, Agents, Multimodal·Pop: 88/100
Perf: ~18.6 tok/s · first token ~1.0s
Best for coding, agents, multimodal. Strong fit for 8 GB RAM with balanced speed and quality.
Gemma / 2.3B / Q4_K_M / ~2.3 GB
Best for: IoT, Mobile, Edge·Pop: 76/100
Perf: ~30.5 tok/s · first token ~0.8s
Best for iot, mobile, edge. Strong fit for 8 GB RAM with balanced speed and quality.
Qwen / 2B / Q4_K_M / ~1.8 GB
Best for: Chat, Edge tasks·Pop: 75/100
Perf: ~34.6 tok/s · first token ~0.7s
Best for chat, edge tasks. Strong fit for 8 GB RAM with balanced speed and quality.
Gemma / 4B / Q4_K_M / ~3.5 GB
Best for: Chat, Coding·Pop: 81/100
Perf: ~18.6 tok/s · first token ~1.0s
Best for chat, coding. Strong fit for 8 GB RAM with balanced speed and quality.
Phi / 3.8B / Q4_K_M / ~3.2 GB
Best for: Coding, Chat·Pop: 75/100
Perf: ~19.4 tok/s · first token ~1.0s
Best for coding, chat. Strong fit for 8 GB RAM with balanced speed and quality.
Llama / 3B / Q4_K_M / ~2.5 GB
Best for: Chat·Pop: 72/100
Perf: ~24.0 tok/s · first token ~0.9s
Best for chat. Strong fit for 8 GB RAM with balanced speed and quality.
Qwen / 3B / Q4_K_M / ~2.5 GB
Best for: Chat, Coding·Pop: 64/100
Perf: ~24.0 tok/s · first token ~0.9s
Best for chat, coding. Strong fit for 8 GB RAM with balanced speed and quality.
Gemma / 2B / Q4_K_M / ~1.8 GB
Best for: Chat·Pop: 62/100
Perf: ~34.6 tok/s · first token ~0.7s
Best for chat. Strong fit for 8 GB RAM with balanced speed and quality.
As a thinking tool. Voice-memo a premise and have the model expand it into bullets; ask for five complications to a scene while in line for coffee; draft a character monologue on the train. The 4B class is great at idea-volume and rough sketches, exactly what mobile moments are for.
Do not draft chapters here: small-model prose plus a phone keyboard is the wrong tool twice over. Apps with iCloud-synced history make the handoff natural, and the sketch you made at lunch is waiting in context when you sit down at the Mac.
Use the ModelFit wizard to test different RAM and chip configurations for your exact iPhone 16 Pro setup.
Open ModelFit Wizard