Blog

Guides, comparisons, and insights on running local LLMs on Apple Silicon.

Run a 35B LLM on a $599 Mac Mini M4 (16GB): The mmap Trick

2026-04-14

> TL;DR: A base Mac Mini M4 with 16 GB RAM can run Qwen3.5-35B-A3B at 17.3 tok/s with zero swap using llama.cpp's --mmap flag. The model is a Mixture-of-Experts (35B total, only 3B active per tok...

Run Gemma on iPhone: Google AI Edge Gallery Tested (2026)

2026-04-08

> TL;DR: Google AI Edge Gallery is a free Apple App Store app that runs Gemma 4 E2B (~2.5 GB) and E4B (~5 GB) fully on-device on iPhone. Real-world speed: ~30 tok/s on iPhone 16 Pro, ~12 tok/s on iPho...

Qwen 3.5 Medium Review: 7x Less RAM, Same Quality (April 2026)

2026-04-04

Alibaba's Qwen 3.5 Medium series remains a masterclass in one thing: smart architecture beats raw parameters. And in April 2026, the Qwen ecosystem has grown even stronger.

Run Claude Code Free: Ollama Local Setup in 4 Steps (2026)

2026-04-04

Claude Code is Anthropic's AI coding agent — and you can run it locally with Ollama instead of paying $100/month for Claude Max. Since Ollama v0.14 shipped native Anthropic Messages API compatibi...

Best LLMs for Mac Mini M4 16GB RAM — Top 5 Tested (2026)

2026-03-09

> TL;DR: The Mac Mini M4 with 16GB RAM runs models up to ~13B parameters at Q4 quantization without breaking a sweat. Qwen3 8B is the best daily driver at 28–35 tok/s. Unlike the fanless MacBook Air, ...

LFM2 + LocalCowork: Offline AI Agent for Mac (2026)

2026-03-09

Liquid AI just shipped a working on-device AI agent — not a demo, a real one. LFM2-24B-A2B paired with LocalCowork runs 75 MCP tools entirely on your Mac: security scanning, file operations, audit log...

Qwen 3.5 4B Beats GPT-4o: 1,000-Prompt Test Results (2026)

2026-03-09

A Johns Hopkins researcher ran both Qwen 3.5 4B and GPT-4o on 1,000 real-world prompts. Qwen won 499, lost 431, and tied 70 — a statistically significant edge over OpenAI's flagship API (N8Progra...

Best LLM for MacBook Pro M4 Pro with 24GB RAM (2026)

2026-03-08

> TL;DR: The MacBook Pro M4 Pro with 24GB RAM is one of the best local AI machines you can buy. Qwen3 14B is the clear all-rounder at 28–38 tok/s, fitting comfortably in ~9.5GB. For reasoning, DeepSee...

DeepSeek V4 Is Coming: What Mac Users Need to Know (2026)

2026-03-08

DeepSeek V4 could drop this week — a 1-trillion-parameter multimodal monster with a 1M-token context window and leaked coding benchmarks that reportedly beat GPT-5.3 and Claude Opus 4.6. Three release...

Mac Mini for Local AI: The Best Value Setup in 2026

2026-03-08

> TL;DR: The Mac Mini M4 Pro with 64GB ($1,999–$2,499) is the best value local AI machine in 2026. It runs 30B-class models at 12–18 tok/s, costs ~$25/year in electricity, and gives every gigabyte of ...

Apple Core AI Framework: Core ML Replacement Coming at WWDC 2026

2026-03-07

Apple is replacing Core ML with a brand-new framework called Core AI, set to debut at WWDC 2026 this June. The rename from "Machine Learning" to "AI" isn't cosmetic — it signals a fundamental shi...

Best LLM for MacBook Air M4 16GB: 5 Models Ranked (2026)

2026-03-07

> TL;DR: The MacBook Air M4 with 16GB RAM can comfortably run models up to ~14B parameters at Q4 quantization. Qwen3 8B is the best all-rounder — 30–40 tok/s, fits in ~5.5GB, and outperforms models tw...

LLMfit: Find the Best LLM for Your Mac in Seconds (2026)

2026-03-06

TL;DR: LLMfit is a Rust CLI that detects your RAM, CPU, and GPU, then scores 200+ models across quality, speed, fit, and context. It picks the best quantization that fits your memory and estimates tok...

Apple M5 Pro & M5 Max: The Local LLM Leap (2026)

2026-03-05

Apple just announced the M5 Pro and M5 Max — and the local AI community is paying close attention. With up to 4x faster LLM prompt processing versus M4, 128GB of unified memory, and Neural Accelerator...

Qwen 3.5 Small Models: 4B Beats 20B Models on Any Mac (2026)

2026-03-04

Alibaba just dropped four small Qwen 3.5 models that rewrite what "small" means in local AI. The Qwen3.5-4B scores 88.8 on MMLU-Redux — higher than GPT-class 20B open-source models (HuggingFace, March...

Qwen Team Exodus: 3 Key Leaders Leave Alibaba (2026)

2026-03-04

Three senior leaders have left Alibaba's Qwen team in Q1 2026 — including tech lead Lin Junyang, the architect behind the world's most downloaded open-source AI project. The departures came days after...

Ollama 0.17: Up to 40% Faster — Apple Silicon Benchmarks

2026-02-26

Ollama 0.17 dropped on February 21, 2026, with a major overhaul of its inference engine. Performance gains hit 40% on NVIDIA GPUs and 10-15% on Apple Silicon. Here's what actually changes for Mac user...

Claude Code on Local LLMs: Complete Setup Guide (2026)

2026-02-25

Want to use Claude Code without Anthropic's API? Here's how @sudoingX hacked it to run on local Qwen models with impressive results.

DeepSeek V3 vs Qwen 3.5 on Mac: Speed, RAM and Winner (2026)

2026-02-25

> Update (April 2026): DeepSeek V4 (1T params) is expected imminently. V3 remains the latest released DeepSeek model. Qwen 3.5 is still Alibaba's current flagship. All benchmarks below remain val...

Local LLMs vs GPT-4 and Claude: Benchmark Results (2026)

2026-02-24

Verdict: 100B+ local models approach within 5%. Gap mainly on very long context.

How to Install Ollama on Mac in 2026 — M1 to M4 Guide

2026-02-24

Want to run language models locally on your Mac? This guide shows you how to install Ollama and launch your first model in 5 minutes.

MacBook Air vs Pro M4 for LLMs: Real Benchmarks (2026)

2026-02-24

Apple recently updated its entire Mac lineup with M4 chips. But which one should you choose for running language models locally? We compare the two options.