TL;DR: The MacBook Air M5 with 16GB RAM runs the same ~14B-parameter ceiling as the M4 Air, but its 153 GB/s memory bandwidth — a 28% jump over the M4's 120 GB/s (Apple) — makes every model faster. Qwen3.5 9B is the best all-rounder at ~7GB loaded. Qwen3.5 4B is the speed pick, and Gemma 4 E4B covers fast multimodal chat. Token speeds below are ModelFit estimates, not measured benchmarks.
Apple announced the M5 MacBook Air on March 3, 2026, with units shipping March 11 (Apple Newsroom). For local AI, the headline is memory bandwidth: the M5 moves data at 153 GB/s versus 120 GB/s on the M4. Because LLM token generation is memory-bandwidth-bound, that 28% gain is the single spec that matters most for inference speed on a fanless laptop.
This guide ranks which models to run on the base 16GB M5 Air, how fast each one goes, and where the 24GB and 32GB configurations change the picture. For the full chip rundown, see the MacBook Air device page, and for sizing any model to any RAM tier, the how much RAM for a local LLM guide.
What Changed from the M4 Air?
The M5 Air keeps the same memory ceilings — 16GB base, configurable to 24GB or 32GB (Apple specs) — so the model sizes you can load are unchanged. What moved is speed and price:
| Spec | MacBook Air M4 | MacBook Air M5 |
|---|---|---|
| Memory bandwidth | 120 GB/s | 153 GB/s (+28%) |
| Unified memory | 16 / 24 / 32 GB | 16 / 24 / 32 GB |
| Neural Engine | 16-core | 16-core |
| GPU | 10-core | up to 10-core, Neural Accelerator per core |
| Base storage | 256 GB | 512 GB |
| Starting price (13") | $999 | $1,099 |
Apple states the M5 Air delivers "up to 4x faster performance for AI tasks than MacBook Air with M4, and up to 9.5x faster than MacBook Air with M1" (Apple). That figure measures specific Neural-Engine and GPU-accelerator workloads. For Ollama token generation — which runs on the GPU cores and is bandwidth-bound — the realistic gain tracks the 28% bandwidth uplift, not the 4x marketing number. We size our estimates to bandwidth and label them as estimates.
How Much RAM Do You Actually Have for Models?
macOS reserves memory aggressively, so the 16GB on the box is not all yours.
| Allocation | Typical Size |
|---|---|
| macOS kernel + services | ~2–3 GB |
| Active apps (browser, editor) | ~2–4 GB |
| Available for LLM | ~9–12 GB |
The rule of thumb holds across every Apple Silicon Mac: Q4_K_M quantization costs roughly 0.6 GB per billion parameters. A 4B model needs ~3.5GB. A 9B model needs ~7GB. A 14B model needs ~9.5GB — doable on 16GB, but tight. For the full model-size-to-memory matrix, see how much RAM you need for a local LLM.
Performance Expectations
Here is realistic token generation on the M5 Air 16GB with Ollama at Q4_K_M. These are ModelFit estimates, scaled from M4 community numbers by the 28% bandwidth increase — not measured benchmarks.
| Model | RAM Used | Est. Tokens/sec | Best For |
|---|---|---|---|
| Qwen3.5 4B Q4_K_M | ~3.5 GB | 50–62 tok/s (est.) | Speed, coding |
| Gemma 4 E4B Q4_K_M | ~4.0 GB | 44–56 tok/s (est.) | Multimodal chat |
| Qwen3 8B Q4_K_M | ~5.5 GB | 38–50 tok/s (est.) | Proven runner-up |
| Qwen3.5 9B Q4_K_M | ~7.0 GB | 28–35 tok/s (est.) | Quality all-rounder |
| Gemma 3 12B QAT | ~8.0 GB | 28–35 tok/s (est.) | Quality writing |
Models above ~14B parameters still do not fit comfortably in 16GB — they load but swap into CPU memory, dropping speed below 5 tok/s. The extra bandwidth does not change the memory ceiling. For 14B-27B models, configure 24GB or 32GB at purchase, since Apple Silicon memory cannot be upgraded later.
The Top Picks
1. Qwen3.5 9B — Best All-Rounder
Qwen3.5 9B is the model the 16GB Air was built for. At ~7GB loaded it fits with your browser open, ships native multimodal input, and carries a 262K context window. On the M5's faster bus it lands around 28–35 tok/s — comfortably interactive.
ollama run qwen3.5:9b
Why it wins: near-frontier quality under 10GB of memory, covering writing, analysis, coding, and image questions from one model.
2. Qwen3.5 4B — Best Speed
When responsiveness beats depth, Qwen3.5 4B is the fastest quality model in its class — an estimated 50–62 tok/s on the M5 Air at ~3.5GB. It shares the 9B's multimodal input and answers faster than you can read. Pair it with an editor via our coding on MacBook Air guide.
ollama run qwen3.5:4b
3. Gemma 4 E4B — Best Efficient Multimodal
Google's Gemma 4 E4B uses Per-Layer Embeddings to act like a larger model while loading only ~4GB. It handles text and image input and runs at an estimated 44–56 tok/s on the M5 — a strong fit for screenshot questions and chart reading on a fanless machine.
ollama run gemma4:e4b
4. Qwen3 8B — Proven Runner-Up
The previous-generation favorite is still excellent: battle-tested, widely documented, ~5.5GB, and an estimated 38–50 tok/s on the M5. If you already run it, there is no urgency to switch — but new installs should start with Qwen3.5 9B.
ollama run qwen3:8b
5. Gemma 3 12B QAT — Quality Writing Fallback
Google's Quantization-Aware Training variant of Gemma 3 12B survives aggressive quantization with minimal quality loss. At ~8GB it is a solid creative-writing pick, though Qwen3.5 9B now matches it in less memory.
ollama run gemma3:12b
Should You Buy the M5 Air, or Something Else?
M5 Air vs M4 Air for local AI: if you already own an M4 Air, the 28% bandwidth gain is real but not transformative for inference — a 22–28 tok/s model becomes roughly 28–35 tok/s. Buy the M5 for the larger 512GB base storage and a new machine's longevity, not for an AI speed revolution. If you are buying new, the M5 is the obvious pick at the same tier. Base M5 Air vs M5 Pro MacBook Pro: the Air tops out at 32GB and throttles under sustained load because it is fanless. If you run 27B-70B models or generate for hours at a stretch, the actively cooled MacBook Pro is the better tool — see our M5 Pro and M5 Max local LLM guide. For 7B-14B models and bursty interactive use, the Air handles the job and stays silent. Which RAM should you configure? 16GB covers 7B-9B models comfortably. Step up to 24GB for clean 14B headroom, or 32GB if you want to run 14B models alongside a full app stack. The 16GB vs 32GB breakdown walks through the trade-off.Cooling Reality Check
The MacBook Air M5 is fanless, like every Air. For interactive chat you will never notice. Under continuous load — a long reasoning chain or batch document processing — the chip throttles after 20–30 minutes, costing roughly 15–25% of peak speed. For sustained workloads, the MacBook Pro or Mac Mini with active cooling holds throughput steady.
Quick Comparison Table
| Use Case | Recommended Model | Command |
|---|---|---|
| General assistant | Qwen3.5 9B | ollama run qwen3.5:9b |
| Maximum speed | Qwen3.5 4B | ollama run qwen3.5:4b |
| Multimodal chat | Gemma 4 E4B | ollama run gemma4:e4b |
| Proven fallback | Qwen3 8B | ollama run qwen3:8b |
| Quality writing | Gemma 3 12B QAT | ollama run gemma3:12b |
New to Ollama? The Ollama setup guide installs it in under five minutes, and the best LLM for MacBook overview ranks picks across every configuration. To match a model to your exact chip and RAM, run the ModelFit wizard or browse the open compatibility dataset.
FAQ
Is the MacBook Air M5 good for running local LLMs?
Yes. The M5 Air runs models up to ~14B parameters at Q4 on 16GB, and its 153 GB/s memory bandwidth makes token generation about 28% faster than the M4 Air. Unified memory means all of the RAM is available to the GPU for inference, unlike a PC limited to discrete VRAM.
How much faster is the M5 Air than the M4 Air for AI?
For Ollama token generation, expect roughly 28% faster output, tracking the bandwidth increase from 120 GB/s to 153 GB/s. Apple's headline "up to 4x faster AI" claim measures specific Neural-Engine and GPU-accelerator tasks, not general LLM inference.
What is the best LLM for a 16GB MacBook Air M5?
Qwen3.5 9B is the best all-rounder at ~7GB loaded, with an estimated 28–35 tok/s on the M5. For maximum speed, Qwen3.5 4B runs at an estimated 50–62 tok/s. Both ship native multimodal input and a 262K context window.
How much RAM should I get on the M5 Air for AI?
16GB handles 7B-9B models comfortably. 24GB gives clean headroom for 14B models, and 32GB lets you run 14B models alongside a full app stack. Apple Silicon memory is soldered and cannot be upgraded later, so choose carefully at purchase.
Can the M5 Air run 70B models?
No. A 70B model at Q4 needs about 42GB of memory, far beyond the Air's 32GB maximum. For 70B models you need a MacBook Pro or Mac Studio with 64GB or more — see the M5 Pro and M5 Max guide.
Related Model Families:- Qwen Models — All Qwen variants, RAM requirements, and benchmarks
- Gemma Models — Google's efficient models from E2B to 31B
- Phi Models — Microsoft's small-but-mighty models for low-RAM devices
Where to Buy for Local AI
best configsModelFit may earn a commission on purchases made through these links, at no extra cost to you. Recommendations are based on local-AI performance, not commissions.
The weekly local-AI refresh
New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.
Have questions? Reach out on X/Twitter