2026-03-09
Run Claude Code for Free with Ollama on Mac (2026)
Claude Code is Anthropic's AI coding agent — and you can run it locally with Ollama instead of paying $100/month for Claude Max. One environment variable swap points Claude Code at your local models. The catch: local 4B–27B models won't match real Claude Opus for complex agentic tasks. But for file reading, code generation, refactoring, and git operations, it works surprisingly well.
TL;DR: Run ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=ollama claude --model qwen3.5:27b to use Claude Code with free local models. Works on any Apple Silicon Mac. Best model depends on your RAM: 4B for 8GB, 9B for 16GB, 27B for 24GB+.
This setup went viral after @itsafiz shared it on X, pulling 51,000 views and 781 bookmarks in 48 hours (March 2026). The community response confirmed what many solo developers suspected: Claude Code's real value is in its CLI harness, not just the model behind it.
How Does Claude Code Work with Local Models?
Claude Code accepts two environment variables that redirect all API calls to any OpenAI-compatible endpoint. Ollama exposes exactly that interface on localhost:11434. No proxy needed. No translation layer. Just point and run.
The architecture is simple:
Claude Code CLI → localhost:11434 → Ollama → Local Model (Qwen, etc.)
Claude Code's file operations, git integration, and project context management all work the same. The only difference is which model generates the responses.
Step 1: Install Ollama
Open Terminal and run:
brew install ollama
Start the Ollama server:
ollama serve
Verify it's running:
curl http://localhost:11434/api/tags
You should see a JSON response with your available models. If you just installed Ollama, the list will be empty — that's fine.
Step 2: Pull a Coding Model
Which model you pull depends on how much RAM your Mac has. Ollama downloads models on first pull and caches them locally.
RAM-to-Model Guide
| Model | Download Size | RAM Needed | Best For | Mac Compatibility |
|---|---|---|---|---|
| qwen3.5:4b | ~2.5 GB | ~4 GB total | Quick fixes, simple tasks | Any Mac (8GB+) |
| qwen3.5:9b | ~5 GB | ~7 GB total | General coding | MacBook Air 16GB |
| qwen3.5:27b | ~16 GB | ~20 GB total | Complex refactoring | Mac Mini/Pro 24GB+ |
| Qwopus 27B | ~16.5 GB | ~20 GB total | Opus-style reasoning | Mac Mini/Pro 24GB+ |
| qwen3.5:35b-a3b | ~22 GB | ~22 GB total | Best MoE efficiency | Mac Mini/Pro 24GB+ |
Pull your model:
# For 8GB Macs — small but surprisingly capable
ollama pull qwen3.5:4b
# For 16GB MacBook Air — the sweet spot
ollama pull qwen3.5:9b
# For 24GB+ Mac — best local coding quality
ollama pull qwen3.5:27b
Qwen 3.5 4B matches GPT-4o on independent testing with a 49.9% win rate across 1,000 real-world prompts (N8Programs, March 2026). Even the smallest model here is genuinely useful.
Step 3: Install Claude Code
You need Node.js 18+ installed. Then:
npm install -g @anthropic-ai/claude-code
Verify the installation:
claude --version
Step 4: Run Claude Code with Your Local Model
This is the one-liner that makes it all work:
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_AUTH_TOKEN=ollama \
claude --model qwen3.5:27b
Replace qwen3.5:27b with whichever model you pulled in Step 2.
Make It Permanent
Add this to your ~/.zshrc so you don't have to type it every time:
# Claude Code with local Ollama
alias claude-local='ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=ollama claude --model qwen3.5:27b'
Then reload your shell:
source ~/.zshrc
Now just type claude-local to start a session.
What Works and What Doesn't?
Be honest with yourself about what local models can and cannot do. Claude Code's CLI is excellent, but the model quality gap is real.
Works Well
- File reading and context — Claude Code indexes your project the same way regardless of backend
- Code generation — Single-file functions, components, utilities
- Refactoring — Renaming, restructuring, pattern application
- Git operations — Commits, diffs, branch management
- Debugging — Reading error messages and suggesting fixes
- Explaining code — Summarizing what files and functions do
Works Poorly with Small Models
- Multi-step agentic tasks — Local 4B–9B models lack the reasoning depth for complex chains of edits across multiple files
- Tool calling reliability — Varies significantly by model; smaller models fail more often
- Large context windows — 27B models handle this better, but 4B models struggle past 8K tokens
- Architectural decisions — Don't expect a 4B model to design your system architecture
As @debdoot_x noted: "So basically running Claude Code without a Claude model. Funny." It's a fair point. You're getting Claude Code's interface and tooling with a local model's intelligence. For many tasks, that's enough.
RAM Reality Check
This matters more than anything else for Mac users. As @pinkham warned: "Dont expect to run Chrome, VS Code, Slack and Zoom while doing so."
Here's the real math. macOS itself uses 4–6 GB of RAM. Every browser tab, editor, and communication app adds more. The model needs contiguous memory.
| Your Mac | Available for Model | Recommended Model | What You'll Close |
|---|---|---|---|
| 8GB MacBook Air | ~3–4 GB | qwen3.5:4b | Everything except Terminal |
| 16GB MacBook Air | ~8–10 GB | qwen3.5:9b | Browser tabs, Slack |
| 24GB Mac Mini/Pro | ~16–18 GB | qwen3.5:27b | Maybe Chrome |
| 36GB+ Mac Pro | ~28+ GB | qwen3.5:27b + apps | Nothing |
The Mac Mini M4 Pro with 64GB is the sweet spot for running 27B models alongside your normal workflow. At $1,999–$2,499, it pays for itself in about 20 months compared to Claude Max at $100/month.
How Much Money Does This Actually Save?
The cost comparison is straightforward:
| Option | Monthly Cost | Annual Cost | What You Get |
|---|---|---|---|
| Claude Max | $100/month | $1,200/year | Full Opus 4.6, unlimited usage |
| Claude Pro | $20/month | $240/year | Opus with usage limits |
| Claude API | Variable | $50–$500+/year | Pay per token, unpredictable |
| Ollama local | ~$2/month electricity | ~$24/year | Free inference, local models |
As @bygregorr pointed out: this "eliminates unpredictable API costs for solo devs." If you're burning through $50–$200/month in API credits for coding assistance, switching to local models for routine tasks and saving the API budget for complex ones is a legitimate strategy.
Advanced: LiteLLM Proxy for Model Routing
For power users, LiteLLM lets you map different Claude model names to different local models. This means Claude Code can automatically use a small model for simple queries and a large one for complex tasks.
pip install litellm
Create a litellm_config.yaml:
model_list:
- model_name: claude-opus-4-6-20250915
litellm_params:
model: ollama/qwen3.5:27b
api_base: http://localhost:11434
- model_name: claude-sonnet-4-6-20250514
litellm_params:
model: ollama/qwen3.5:9b
api_base: http://localhost:11434
- model_name: claude-haiku-4-5-20251001
litellm_params:
model: ollama/qwen3.5:4b
api_base: http://localhost:11434
Start the proxy:
litellm --config litellm_config.yaml --port 4000
Then point Claude Code at LiteLLM instead of Ollama directly:
ANTHROPIC_BASE_URL=http://localhost:4000 \
ANTHROPIC_AUTH_TOKEN=local \
claude
This adds a layer of complexity but gives you model routing without changing your Claude Code workflow.
Alternatives Worth Knowing
Claude Code with Ollama isn't the only option. Other tools work with local models natively:
- OpenCode — Open-source Claude Code alternative with native Ollama support. No environment variable hacks needed.
- Aider — Another AI coding agent with direct Ollama integration and git awareness.
- Bifrost — Lightweight API proxy as an alternative to LiteLLM.
Each has trade-offs. Claude Code has the most polished CLI experience. Aider has better git integration. OpenCode is fully open-source.
Security Considerations
@tasa2379 pointed out that most guides skip security hardening. Fair criticism. A few things to lock down:1. Bind Ollama to localhost only — The default ollama serve already binds to 127.0.0.1:11434. Don't change this unless you know what you're doing.
2. Don't expose port 11434 to your network — No need for external access if Claude Code runs on the same machine.
3. Review model permissions — Claude Code can read and modify files in your project directory. Local doesn't mean safer if the model hallucinates a destructive command.
4. Keep Ollama updated — brew upgrade ollama regularly for security patches.
FAQ
Can I run this on a Mac Mini?
Yes. The Mac Mini M4 is the best value local AI machine in 2026. The 24GB model runs qwen3.5:27b comfortably. The 16GB base runs qwen3.5:4b or qwen3.5:9b.
Is the local version as good as real Claude Code?
No. You get Claude Code's CLI, file management, and tooling — but the model intelligence depends on what you run locally. A 4B model won't match Opus 4.6 for complex multi-step reasoning. For single-file edits and routine coding, local models handle 80%+ of tasks well.
Which model should I use for coding?
Qwen 3.5 models are the current best for local coding. Start with qwen3.5:4b if you have limited RAM. Move to qwen3.5:27b if you have 24GB+. The 9B variant is the sweet spot for 16GB MacBook Air users.Does this work on Intel Macs?
Technically yes, but performance will be poor. Ollama on Intel Macs uses CPU-only inference, which is 5–10x slower than Apple Silicon's GPU acceleration. We don't recommend it for practical use.
Can I switch between local and cloud models?
Yes. Use the alias approach — set claude-local for Ollama and keep the default claude command pointed at Anthropic's API. Use local for routine tasks and cloud for complex agentic work. This hybrid approach gives you the best of both worlds while keeping API costs low.
Have questions? Reach out on X/Twitter