Blog

Field notes — guides, comparisons, and insights on running local LLMs on Apple Silicon.

The weekly local-AI refresh

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

All articles
2026-06-09

Run a Local LLM on Mac With No Terminal (2026)

The fastest way to run a local AI on a Mac without touching the command line is LM Studio: a free desktop app with a ChatGPT-style chat window, a built-in model browser, and automatic Apple Silicon (M...

Read more
2026-06-08

Qwen 3.6 on Mac: The 17GB Coder Scoring 77.2 (2026)

> TL;DR: Qwen 3.6 27B scores 77.2 on SWE-Bench Verified (Qwen3.6-27B card, 2026) and downloads at ~17GB on Ollama, so it fits a 32GB Mac. The MoE sibling, 35B-A3B, activates only 3B params per token —...

Read more
2026-06-03

Best LLM for MacBook Pro M4 Max 64GB (2026)

> TL;DR: The MacBook Pro M4 Max with 64GB RAM is a 70B-capable laptop. Qwen3.6 27B is the quality pick at ~20–30 tok/s (est.), fitting in ~18GB. Qwen3.6 35B-A3B delivers near-flagship reasoning at sma...

Read more
2026-06-03

Best LLM for MacBook Pro M4 Pro with 24GB RAM (2026)

> TL;DR: The MacBook Pro M4 Pro with 24GB RAM is one of the best local AI machines you can buy. Qwen3.5 9B is the clear all-rounder — near-frontier quality, native multimodal, ~7GB loaded at interacti...

Read more
2026-06-03

Best LLM for MacBook Pro M5 Pro 64GB (2026)

> TL;DR: The MacBook Pro M5 Pro with 64GB is the sweet spot for serious local AI. Qwen3.6 27B at Q4 needs only ~18GB and runs at usable speeds, while 7B-class daily drivers hit 80–100 t/s (Apple). The...

Read more
2026-06-03

Qwen 3.5 on Mac: The 20GB Model That Beats a 235B (2026)

> TL;DR: Qwen3.5-35B-A3B activates only 3B of its 35B parameters per token, needs ~20GB at Q4 (vs ~140GB for the 235B model it outscores), and runs on a 24GB MacBook Pro. With Ollama 0.19's MLX backen...

Read more
2026-06-01

Best Local AI Coder for Mac: Qwen3.6 vs Gemma 4 (2026)

> TL;DR: Qwen3.6-35B-A3B is the best local coder for Mac: 73.4 on SWE-bench Verified with only 3B active parameters, fitting a 24GB Apple Silicon machine. Gemma 4 31B matches it on knowledge (85.2 MML...

Read more
2026-05-30

Your Local LLMs Just Got 2x Faster on Mac (May 2026)

> TL;DR: May 2026 shipped no new must-run open-weight model — the gains came from the runtime layer. Ollama added Gemma 4 MTP speculative decoding ("over a 2x speed increase" on coding) and reworked i...

Read more
2026-05-14

The April 2026 Local LLM Wave: Qwen3.6, Gemma 4, Llama 4, DeepSeek V4

> TL;DR: Four frontier open-weight families shipped in April 2026: Qwen3.6, Gemma 4, Llama 4 Scout/Maverick, and DeepSeek V4. Qwen3.6-27B leads dense open models on SWE-Bench Verified at 77.2%, closin...

Read more
2026-04-14

Run a 35B LLM on a $599 Mac Mini M4 (16GB): The mmap Trick

> TL;DR: A base Mac Mini M4 with 16 GB RAM can run Qwen3.5-35B-A3B at 17.3 tok/s with zero swap using llama.cpp's --mmap flag. The model is a Mixture-of-Experts (35B total, only 3B active per tok...

Read more
2026-04-08

Run Gemma on iPhone: Google AI Edge Gallery Guide (2026)

> TL;DR: Google AI Edge Gallery is a free Apple App Store app that runs Gemma 4 E2B (~2.5 GB) and E4B (~5 GB) fully on-device on iPhone. Real-world speed: ~30 tok/s on iPhone 16 Pro, ~12 tok/s on iPho...

Read more
2026-04-04

Run Claude Code Free: Ollama Local Setup in 4 Steps (2026)

Claude Code is Anthropic's AI coding agent — and you can run it locally with Ollama instead of paying $100/month for Claude Max. Since Ollama v0.14 shipped native Anthropic Messages API compatibi...

Read more
2026-03-09

LFM2 + LocalCowork: Offline AI Agent for Mac (2026)

Liquid AI just shipped a working on-device AI agent — not a demo, a real one. LFM2-24B-A2B paired with LocalCowork runs 75 MCP tools entirely on your Mac: security scanning, file operations, audit log...

Read more
2026-03-09

Qwen 3.5 4B Beats GPT-4o: 1,000-Prompt Test Results (2026)

A Johns Hopkins researcher ran both Qwen 3.5 4B and GPT-4o on 1,000 real-world prompts. Qwen won 499, lost 431, and tied 70 — a statistically significant edge over OpenAI's flagship API (N8Progra...

Read more
2026-03-08

DeepSeek V4 Is Coming: What Mac Users Need to Know (2026)

> Update (May 2026): DeepSeek V4 has since shipped. The official DeepSeek-V4-Pro model card now lists 80.6% on SWE-Bench Verified — confirming the pre-release leaks were in the right range. See our lo...

Read more
2026-03-08

Mac Mini for Local AI: The Best Value Setup in 2026

> TL;DR: The Mac Mini M4 Pro with 64GB ($1,999–$2,499) is the best value local AI machine in 2026. It runs 30B-class models at 12–18 tok/s, costs ~$25/year in electricity, and gives every gigabyte of ...

Read more
2026-03-07

Apple Core AI Framework: Core ML Replacement Coming at WWDC 2026

Apple is replacing Core ML with a brand-new framework called Core AI, set to debut at WWDC 2026 this June. The rename from "Machine Learning" to "AI" isn't cosmetic — it signals a fundamental shi...

Read more
2026-03-06

LLMfit: Find the Best LLM for Your Mac in Seconds (2026)

> TL;DR: LLMfit is a Rust CLI that detects your RAM, CPU, and GPU, then scores 200+ models across quality, speed, fit, and context. It picks the best quantization for your memory and estimates tokens/...

Read more
2026-03-05

Apple M5 Pro & M5 Max: The Local LLM Leap (2026)

> TL;DR: The M5 Pro and M5 Max bring Neural Accelerators to every GPU core, cutting prompt processing time 3.3-4x versus M4 — a prompt that took 81 seconds now takes 18. The M5 Max's 128GB unified mem...

Read more
2026-03-04

Qwen 3.5 Small Models: 4B Beats 20B Models on Any Mac (2026)

Alibaba just dropped four small Qwen 3.5 models that rewrite what "small" means in local AI. The Qwen3.5-4B scores 88.8 on MMLU-Redux — higher than GPT-class 20B open-source models (HuggingFace, March...

Read more
2026-03-04

Qwen Team Exodus: 3 Key Leaders Leave Alibaba (2026)

> TL;DR: Qwen tech lead Lin Junyang, post-training head Yu Bowen, and staff researcher Binyuan Hui all left Alibaba in Q1 2026 — during Qwen's most productive stretch ever: 9 models in 16 days, 1B+ do...

Read more
2026-02-26

Ollama 0.17: Up to 40% Faster — Apple Silicon Benchmarks

> TL;DR: Ollama 0.17 brings 10-15% faster prompt processing on Apple Silicon (up to 40% on NVIDIA), an 8-bit KV cache that halves context memory, and automatic context sizing based on your RAM. Update...

Read more
2026-02-25

Claude Code on Local LLMs: Complete Setup Guide (2026)

> TL;DR: Claude Code can drive a local model through LiteLLM + llama-server. On a ~$1,400 dual RTX 3090 rig, Qwen3-Coder-Next (80B MoE, 3B active) hit 79 tok/s with expert offloading and passed 7/7 co...

Read more
2026-02-25

DeepSeek V3 vs Qwen 3.5 on Mac: Speed, RAM and Winner (2026)

> TL;DR: DeepSeek-V3 (671B total, 37B active) posts 88.5 on MMLU and 82.6 on HumanEval-Mul — but its size limits it to 96GB+ Mac Studios. Qwen3.5-35B-A3B scores 85.3 MMLU-Pro, fits a 24GB MacBook Pro,...

Read more
2026-02-24

Local LLMs vs GPT-4 and Claude: Benchmark Results (2026)

> TL;DR: The best local models now sit within ~5% of cloud flagships on MMLU. DeepSeek-R1 matches Claude 3.5 Sonnet on HumanEval (92%), and Qwen3.5-35B-A3B beats GPT-3.5 Turbo while running free on a ...

Read more
2026-02-24

How to Install Ollama on Mac (Apple Silicon, 2026): M1–M4

> TL;DR: Run brew install ollama, then ollama run qwen3.5:4b — that's a capable local LLM on any Apple Silicon Mac with 8GB+ RAM in about 5 minutes. 16GB Macs should step up to Qwen3.5 9B; 24GB+ unloc...

Read more
2026-02-24

MacBook Air vs Pro for Local LLMs (2026): Which Mac to Buy

> TL;DR: For most local AI users, the MacBook Pro M4 24GB (~$1,900) is the best buy: active cooling sustains long sessions and 24GB fits Qwen3.5-35B-A3B at an estimated 38 tok/s. The Air M4 throttles ...

Read more