Run Claude Code Free: Ollama Local Setup in 4 Steps (2026)

Claude Code is Anthropic's AI coding agent — and you can run it locally with Ollama instead of paying $100/month for Claude Max. Since Ollama v0.14 shipped native Anthropic Messages API compatibility in January 2026, the setup is dead simple: two environment variables, one command. As of April 2026, Ollama 0.19 with MLX support makes this even better — local inference is now up to 2x faster on Apple Silicon.

TL;DR: Run ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=ollama claude --model qwen3.5:27b to use Claude Code with free local models. Works on any Apple Silicon Mac. Ollama 0.19 with MLX nearly doubles decode speed (58 to 112 tok/s). Best model depends on your RAM: 4B for 8GB, 9B for 16GB, 27B for 24GB+.

Bar chart comparing Ollama 0.18 and 0.19 MLX prefill and decode speed for Claude Code

Ollama 0.19 MLX backend nearly doubles decode speed. Source: post benchmark table.

This setup went viral after @itsafiz shared it on X, pulling 51,000 views and 781 bookmarks in 48 hours (March 2026). The community response confirmed what many solo developers suspected: Claude Code's real value is in its CLI harness, not just the model behind it.

How Does Claude Code Work with Local Models?

Claude Code accepts two environment variables that redirect all API calls to any Anthropic-compatible endpoint. Since Ollama v0.14 (January 2026), Ollama natively exposes an Anthropic Messages API on localhost:11434 — no proxy needed, no translation layer. Just point and run.

The architecture is simple:

Claude Code CLI → localhost:11434 → Ollama (Anthropic API) → Local Model

Claude Code's file operations, git integration, and project context management all work the same. The only difference is which model generates the responses. Tool calling, multi-turn conversations, vision input, and extended thinking all work through this native API layer.

Step 1: Install Ollama (0.19+ Recommended)

Open Terminal and run:

brew install ollama

If you already have Ollama installed, update to 0.19 for MLX acceleration:

brew upgrade ollama

Start the Ollama server:

ollama serve

Verify it's running:

curl http://localhost:11434/api/tags

You should see a JSON response with your available models. If you just installed Ollama, the list will be empty — that's fine. For a full walkthrough, see our complete Ollama installation guide for Mac.

What's New in Ollama 0.19 (March 2026)

Ollama 0.19 is the biggest Mac performance update yet. It rebuilds Apple Silicon inference on top of Apple's MLX framework, taking full advantage of unified memory. The results (Ollama blog, March 31, 2026):

Metric	Ollama 0.18	Ollama 0.19 (MLX)	Improvement
Prefill speed	1,154 tok/s	1,810 tok/s	+57%
Decode speed	58 tok/s	112 tok/s	+93%

That's nearly 2x faster response generation. On M5 chips, Ollama also uses the new GPU Neural Accelerators for even bigger gains. The improved cache system reuses data across conversations to lower memory use and speed up prompt processing — a big win for coding workflows with branching prompts.

For more Ollama performance details, see our Ollama 0.17 Apple Silicon benchmarks article, which covers the previous inference engine overhaul.

Note: MLX preview currently requires 32GB+ unified memory and supports Qwen3.5 models. Support for more models is rolling out. Macs with less than 32GB still get the standard llama.cpp backend, which also improved in 0.19.

Step 2: Pull a Coding Model

Which model you pull depends on how much RAM your Mac has. Ollama downloads models on first pull and caches them locally.

RAM-to-Model Guide (April 2026)

Model	Download Size	RAM Needed	Best For	Mac Compatibility
qwen3.5:4b	~2.5 GB	~4 GB total	Quick fixes, simple tasks	Any Mac (8GB+)
qwen3.5:9b	~5 GB	~7 GB total	General coding	MacBook Air M4 16GB
qwen3.5:27b	~16 GB	~20 GB total	Complex refactoring	MacBook Pro 24GB+
glm-4.7:9b	~5.5 GB	~8 GB total	Fast + large context (128K)	MacBook Air 16GB
qwen3.5:35b-a3b	~22 GB	~22 GB total	Best MoE efficiency	Mac Mini/Pro 24GB+

Pull your model:

# For 8GB Macs — small but surprisingly capable ollama pull qwen3.5:4b # For 16GB MacBook Air — the sweet spot ollama pull qwen3.5:9b # For 24GB+ Mac — best local coding quality ollama pull qwen3.5:27b # Alternative: GLM 4.7 for speed + large context

ollama pull glm-4.7:9b

Qwen 3.5 4B matches GPT-4o on independent testing with a 49.9% win rate across 1,000 real-world prompts (N8Programs, March 2026). Even the smallest model here is genuinely useful. For a deeper comparison between model families, read our DeepSeek V3 vs Qwen 3.5 Mac comparison.

Step 3: Install Claude Code

You need Node.js 18+ installed. Then:

npm install -g @anthropic-ai/claude-code

Verify the installation:

claude --version

Claude Code gets frequent updates — multi-agent collaboration, computer use, and auto mode all shipped in Q1 2026. Keep it current with npm update -g @anthropic-ai/claude-code.

Step 4: Run Claude Code with Your Local Model

This is the one-liner that makes it all work:

ANTHROPIC_BASE_URL=http://localhost:11434 \ ANTHROPIC_AUTH_TOKEN=ollama \

claude --model qwen3.5:27b

Replace qwen3.5:27b with whichever model you pulled in Step 2.

Make It Permanent

Add this to your ~/.zshrc so you don't have to type it every time:

# Claude Code with local Ollama

alias claude-local='ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=ollama claude --model qwen3.5:27b'

Then reload your shell:

source ~/.zshrc

Now just type claude-local to start a session.

What Works and What Doesn't?

Be honest with yourself about what local models can and cannot do. Claude Code's CLI is excellent, but the model quality gap is real. See our local LLMs vs cloud flagships benchmark for detailed numbers.

Works Well

File reading and context — Claude Code indexes your project the same way regardless of backend
Code generation — Single-file functions, components, utilities
Refactoring — Renaming, restructuring, pattern application
Git operations — Commits, diffs, branch management
Debugging — Reading error messages and suggesting fixes
Explaining code — Summarizing what files and functions do

Works Poorly with Small Models

Multi-step agentic tasks — Local 4B-9B models lack the reasoning depth for complex chains of edits across multiple files
Tool calling reliability — Varies significantly by model; smaller models fail more often
Large context windows — 27B models handle this better, but 4B models struggle past 8K tokens
Architectural decisions — Don't expect a 4B model to design your system architecture

As @debdoot_x noted: "So basically running Claude Code without a Claude model. Funny." It's a fair point. You're getting Claude Code's interface and tooling with a local model's intelligence. For many tasks, that's enough.

Ollama 0.19 MLX: What It Means for Claude Code Users

The March 31, 2026 release of Ollama 0.19 is a game-changer for this workflow. Here is why it matters specifically for Claude Code with local models.

Faster First Response

Prefill speed jumped from 1,154 to 1,810 tok/s. When Claude Code sends your project context to the model, it processes that context 57% faster. You wait less for the first token of every response.

Faster Code Generation

Decode speed nearly doubled — from 58 to 112 tok/s. When the model writes a 200-line function, it now takes roughly half the time. Over a full coding session, this adds up to minutes saved.

Smarter Caching for Coding Workflows

Ollama 0.19 takes intelligent cache snapshots for branching prompts. Claude Code sessions are full of branching — "try this approach, no wait, try that instead." The new cache reuses previous computation instead of reprocessing everything from scratch.

How to Enable MLX

If you have 32GB+ unified memory and run Qwen3.5 models, MLX activates automatically in Ollama 0.19. No configuration needed. Just update:

brew upgrade ollama

ollama serve

For Macs with 16GB or less, you still get the improved llama.cpp backend. Not as fast as MLX, but still faster than 0.18.

RAM Reality Check

This matters more than anything else for Mac users. As @pinkham warned: "Dont expect to run Chrome, VS Code, Slack and Zoom while doing so."

Here's the real math. macOS itself uses 4-6 GB of RAM. Every browser tab, editor, and communication app adds more. The model needs contiguous memory.

Your Mac	Available for Model	Recommended Model	What You'll Close
8GB MacBook Air	~3-4 GB	qwen3.5:4b	Everything except Terminal
16GB MacBook Air	~8-10 GB	qwen3.5:9b	Browser tabs, Slack
24GB Mac Mini/Pro	~16-18 GB	qwen3.5:27b	Maybe Chrome
36GB+ Mac Pro	~28+ GB	qwen3.5:27b + apps	Nothing

The Mac Mini M4 Pro with 64GB is the sweet spot for running 27B models alongside your normal workflow. At $1,999-$2,499, it pays for itself in about 20 months compared to Claude Max at $100/month. For a complete breakdown of which models run best on each configuration, see our MacBook Air M4 16GB guide and MacBook Pro M4 Pro 24GB guide.

How Much Money Does This Actually Save?

The cost comparison is straightforward:

Option	Monthly Cost	Annual Cost	What You Get
Claude Max	$100/month	$1,200/year	Full Opus 4.6, unlimited usage
Claude Pro	$20/month	$240/year	Opus with usage limits
Claude API	Variable	$50-$500+/year	Pay per token, unpredictable
Ollama local	~$2/month electricity	~$24/year	Free inference, local models

As @bygregorr pointed out: this "eliminates unpredictable API costs for solo devs." If you're burning through $50-$200/month in API credits for coding assistance, switching to local models for routine tasks and saving the API budget for complex ones is a legitimate strategy.

Pro tip: Use claude-local for routine refactoring, file edits, and code explanation. Switch to the real Claude API for complex multi-file agentic tasks. This hybrid approach can cut your monthly bill by 60-80%.

Advanced: LiteLLM Proxy for Model Routing

For power users, LiteLLM lets you map different Claude model names to different local models. This means Claude Code can automatically use a small model for simple queries and a large one for complex tasks.

pip install litellm

Create a litellm_config.yaml:

model_list: model_name: claude-opus-4-6-20250915 litellm_params: model: ollama/qwen3.5:27b api_base: http://localhost:11434 model_name: claude-sonnet-4-6-20250514 litellm_params: model: ollama/qwen3.5:9b api_base: http://localhost:11434 model_name: claude-haiku-4-5-20251001 litellm_params: model: ollama/qwen3.5:4b

api_base: http://localhost:11434

Start the proxy:

litellm --config litellm_config.yaml --port 4000

Then point Claude Code at LiteLLM instead of Ollama directly:

ANTHROPIC_BASE_URL=http://localhost:4000 \
ANTHROPIC_AUTH_TOKEN=local \claude

This adds a layer of complexity but gives you model routing without changing your Claude Code workflow.

Alternatives Worth Knowing

Claude Code with Ollama isn't the only option. Other tools work with local models natively:

OpenCode — Open-source Claude Code alternative with native Ollama support. No environment variable hacks needed.
Aider — AI coding agent with direct Ollama integration and strong git awareness.
LM Studio — Added Anthropic-compatible /v1/messages endpoint in v0.4.1, works as a drop-in Ollama alternative for Claude Code.
Bifrost — Lightweight API proxy as an alternative to LiteLLM.

Each has trade-offs. Claude Code has the most polished CLI experience. Aider has better git integration. OpenCode is fully open-source. LM Studio offers a GUI for model management.

Security Considerations

@tasa2379 pointed out that most guides skip security hardening. Fair criticism. A few things to lock down:

1. Bind Ollama to localhost only — The default ollama serve already binds to 127.0.0.1:11434. Don't change this unless you know what you're doing.

2. Don't expose port 11434 to your network — No need for external access if Claude Code runs on the same machine.

3. Review model permissions — Claude Code can read and modify files in your project directory. Local doesn't mean safer if the model hallucinates a destructive command.

4. Keep Ollama updated — brew upgrade ollama regularly for security patches.

FAQ

Can I run this on a Mac Mini?

Yes. The Mac Mini M4 is the best value local AI machine in 2026. The 24GB model runs qwen3.5:27b comfortably. The 16GB base runs qwen3.5:4b or qwen3.5:9b.

Is the local version as good as real Claude Code?

No. You get Claude Code's CLI, file management, and tooling — but the model intelligence depends on what you run locally. A 4B model won't match Opus 4.6 for complex multi-step reasoning. For single-file edits and routine coding, local models handle 80%+ of tasks well.

Which model should I use for coding?

Qwen 3.5 models are the current best for local coding. Start with qwen3.5:4b if you have limited RAM. Move to qwen3.5:27b if you have 24GB+. The 9B variant is the sweet spot for 16GB MacBook Air users. GLM-4.7 is a strong alternative if you need a 128K context window.

Does Ollama 0.19 MLX work on all Macs?

The MLX preview requires 32GB+ unified memory and currently supports Qwen3.5 models. Macs with less than 32GB use the standard llama.cpp backend, which also received performance improvements in 0.19. MLX support for more models is planned.

Does this work on Intel Macs?

Technically yes, but performance will be poor. Ollama on Intel Macs uses CPU-only inference, which is 5-10x slower than Apple Silicon's GPU acceleration. We don't recommend it for practical use.

Can I switch between local and cloud models?

Yes. Use the alias approach — set claude-local for Ollama and keep the default claude command pointed at Anthropic's API. Use local for routine tasks and cloud for complex agentic work. This hybrid approach gives you the best of both worlds while keeping API costs low.

What about Claude Code's new features like computer use?

Claude Code's Q1 2026 features — computer use, multi-agent collaboration, auto mode — work with the cloud API. With local models, you get the core CLI features: file operations, git integration, code generation, and tool calling. The advanced agentic features require the full Claude API.

Related Model Families:

Qwen Models — Best local coding models for Claude Code
DeepSeek Models — Strong reasoning for complex tasks

Related Guides:

How to Install Ollama on Mac — Full setup from scratch
Claude Code Local LLM Setup — Hardware-focused guide with GPU setups
Ollama 0.17 Apple Silicon Benchmarks — Previous engine overhaul
MacBook Air vs Pro for LLMs — Which Mac to buy for local AI