2026-03-09

Run Claude Code for Free with Ollama on Mac (2026)

Claude Code is Anthropic's AI coding agent — and you can run it locally with Ollama instead of paying $100/month for Claude Max. One environment variable swap points Claude Code at your local models. The catch: local 4B–27B models won't match real Claude Opus for complex agentic tasks. But for file reading, code generation, refactoring, and git operations, it works surprisingly well.

TL;DR: Run ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=ollama claude --model qwen3.5:27b to use Claude Code with free local models. Works on any Apple Silicon Mac. Best model depends on your RAM: 4B for 8GB, 9B for 16GB, 27B for 24GB+.

This setup went viral after @itsafiz shared it on X, pulling 51,000 views and 781 bookmarks in 48 hours (March 2026). The community response confirmed what many solo developers suspected: Claude Code's real value is in its CLI harness, not just the model behind it.

How Does Claude Code Work with Local Models?

Claude Code accepts two environment variables that redirect all API calls to any OpenAI-compatible endpoint. Ollama exposes exactly that interface on localhost:11434. No proxy needed. No translation layer. Just point and run.

The architecture is simple:

Claude Code CLI → localhost:11434 → Ollama → Local Model (Qwen, etc.)

Claude Code's file operations, git integration, and project context management all work the same. The only difference is which model generates the responses.

Step 1: Install Ollama

Open Terminal and run:

brew install ollama

Start the Ollama server:

ollama serve

Verify it's running:

curl http://localhost:11434/api/tags

You should see a JSON response with your available models. If you just installed Ollama, the list will be empty — that's fine.

Step 2: Pull a Coding Model

Which model you pull depends on how much RAM your Mac has. Ollama downloads models on first pull and caches them locally.

RAM-to-Model Guide

ModelDownload SizeRAM NeededBest ForMac Compatibility
qwen3.5:4b~2.5 GB~4 GB totalQuick fixes, simple tasksAny Mac (8GB+)
qwen3.5:9b~5 GB~7 GB totalGeneral codingMacBook Air 16GB
qwen3.5:27b~16 GB~20 GB totalComplex refactoringMac Mini/Pro 24GB+
Qwopus 27B~16.5 GB~20 GB totalOpus-style reasoningMac Mini/Pro 24GB+
qwen3.5:35b-a3b~22 GB~22 GB totalBest MoE efficiencyMac Mini/Pro 24GB+

Pull your model:

# For 8GB Macs — small but surprisingly capable

ollama pull qwen3.5:4b

# For 16GB MacBook Air — the sweet spot

ollama pull qwen3.5:9b

# For 24GB+ Mac — best local coding quality

ollama pull qwen3.5:27b

Qwen 3.5 4B matches GPT-4o on independent testing with a 49.9% win rate across 1,000 real-world prompts (N8Programs, March 2026). Even the smallest model here is genuinely useful.

Step 3: Install Claude Code

You need Node.js 18+ installed. Then:

npm install -g @anthropic-ai/claude-code

Verify the installation:

claude --version

Step 4: Run Claude Code with Your Local Model

This is the one-liner that makes it all work:

ANTHROPIC_BASE_URL=http://localhost:11434 \

ANTHROPIC_AUTH_TOKEN=ollama \

claude --model qwen3.5:27b

Replace qwen3.5:27b with whichever model you pulled in Step 2.

Make It Permanent

Add this to your ~/.zshrc so you don't have to type it every time:

# Claude Code with local Ollama

alias claude-local='ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=ollama claude --model qwen3.5:27b'

Then reload your shell:

source ~/.zshrc

Now just type claude-local to start a session.

What Works and What Doesn't?

Be honest with yourself about what local models can and cannot do. Claude Code's CLI is excellent, but the model quality gap is real.

Works Well

  • File reading and context — Claude Code indexes your project the same way regardless of backend
  • Code generation — Single-file functions, components, utilities
  • Refactoring — Renaming, restructuring, pattern application
  • Git operations — Commits, diffs, branch management
  • Debugging — Reading error messages and suggesting fixes
  • Explaining code — Summarizing what files and functions do

Works Poorly with Small Models

  • Multi-step agentic tasks — Local 4B–9B models lack the reasoning depth for complex chains of edits across multiple files
  • Tool calling reliability — Varies significantly by model; smaller models fail more often
  • Large context windows — 27B models handle this better, but 4B models struggle past 8K tokens
  • Architectural decisions — Don't expect a 4B model to design your system architecture

As @debdoot_x noted: "So basically running Claude Code without a Claude model. Funny." It's a fair point. You're getting Claude Code's interface and tooling with a local model's intelligence. For many tasks, that's enough.

RAM Reality Check

This matters more than anything else for Mac users. As @pinkham warned: "Dont expect to run Chrome, VS Code, Slack and Zoom while doing so."

Here's the real math. macOS itself uses 4–6 GB of RAM. Every browser tab, editor, and communication app adds more. The model needs contiguous memory.

Your MacAvailable for ModelRecommended ModelWhat You'll Close
8GB MacBook Air~3–4 GBqwen3.5:4bEverything except Terminal
16GB MacBook Air~8–10 GBqwen3.5:9bBrowser tabs, Slack
24GB Mac Mini/Pro~16–18 GBqwen3.5:27bMaybe Chrome
36GB+ Mac Pro~28+ GBqwen3.5:27b + appsNothing

The Mac Mini M4 Pro with 64GB is the sweet spot for running 27B models alongside your normal workflow. At $1,999–$2,499, it pays for itself in about 20 months compared to Claude Max at $100/month.

How Much Money Does This Actually Save?

The cost comparison is straightforward:

OptionMonthly CostAnnual CostWhat You Get
Claude Max$100/month$1,200/yearFull Opus 4.6, unlimited usage
Claude Pro$20/month$240/yearOpus with usage limits
Claude APIVariable$50–$500+/yearPay per token, unpredictable
Ollama local~$2/month electricity~$24/yearFree inference, local models

As @bygregorr pointed out: this "eliminates unpredictable API costs for solo devs." If you're burning through $50–$200/month in API credits for coding assistance, switching to local models for routine tasks and saving the API budget for complex ones is a legitimate strategy.

Advanced: LiteLLM Proxy for Model Routing

For power users, LiteLLM lets you map different Claude model names to different local models. This means Claude Code can automatically use a small model for simple queries and a large one for complex tasks.

pip install litellm

Create a litellm_config.yaml:

model_list:
  • model_name: claude-opus-4-6-20250915
litellm_params:

model: ollama/qwen3.5:27b

api_base: http://localhost:11434

  • model_name: claude-sonnet-4-6-20250514
litellm_params:

model: ollama/qwen3.5:9b

api_base: http://localhost:11434

  • model_name: claude-haiku-4-5-20251001
litellm_params:

model: ollama/qwen3.5:4b

api_base: http://localhost:11434

Start the proxy:

litellm --config litellm_config.yaml --port 4000

Then point Claude Code at LiteLLM instead of Ollama directly:

ANTHROPIC_BASE_URL=http://localhost:4000 \

ANTHROPIC_AUTH_TOKEN=local \

claude

This adds a layer of complexity but gives you model routing without changing your Claude Code workflow.

Alternatives Worth Knowing

Claude Code with Ollama isn't the only option. Other tools work with local models natively:

  • OpenCode — Open-source Claude Code alternative with native Ollama support. No environment variable hacks needed.
  • Aider — Another AI coding agent with direct Ollama integration and git awareness.
  • Bifrost — Lightweight API proxy as an alternative to LiteLLM.

Each has trade-offs. Claude Code has the most polished CLI experience. Aider has better git integration. OpenCode is fully open-source.

Security Considerations

@tasa2379 pointed out that most guides skip security hardening. Fair criticism. A few things to lock down:

1. Bind Ollama to localhost only — The default ollama serve already binds to 127.0.0.1:11434. Don't change this unless you know what you're doing.

2. Don't expose port 11434 to your network — No need for external access if Claude Code runs on the same machine.

3. Review model permissions — Claude Code can read and modify files in your project directory. Local doesn't mean safer if the model hallucinates a destructive command.

4. Keep Ollama updatedbrew upgrade ollama regularly for security patches.

FAQ

Can I run this on a Mac Mini?

Yes. The Mac Mini M4 is the best value local AI machine in 2026. The 24GB model runs qwen3.5:27b comfortably. The 16GB base runs qwen3.5:4b or qwen3.5:9b.

Is the local version as good as real Claude Code?

No. You get Claude Code's CLI, file management, and tooling — but the model intelligence depends on what you run locally. A 4B model won't match Opus 4.6 for complex multi-step reasoning. For single-file edits and routine coding, local models handle 80%+ of tasks well.

Which model should I use for coding?

Qwen 3.5 models are the current best for local coding. Start with qwen3.5:4b if you have limited RAM. Move to qwen3.5:27b if you have 24GB+. The 9B variant is the sweet spot for 16GB MacBook Air users.

Does this work on Intel Macs?

Technically yes, but performance will be poor. Ollama on Intel Macs uses CPU-only inference, which is 5–10x slower than Apple Silicon's GPU acceleration. We don't recommend it for practical use.

Can I switch between local and cloud models?

Yes. Use the alias approach — set claude-local for Ollama and keep the default claude command pointed at Anthropic's API. Use local for routine tasks and cloud for complex agentic work. This hybrid approach gives you the best of both worlds while keeping API costs low.

Have questions? Reach out on X/Twitter