<>CLI agent· 11 models ranked

Best Local LLMs for Open Claude Code

Open-source Claude Code CLI reimplementation, run on local models

Open Claude Code (by ruvnet) is a fully open-source reimplementation of Anthropic's Claude Code CLI, built from the same architecture, with 25 tools, MCP transports, and permission modes. Unlike the official client, it ships a configurable OPENAI_BASE_URL (and ANTHROPIC_BASE_URL), so you can point it straight at a local Ollama, llama.cpp, or LM Studio server and run it entirely on open-source models. Because it mirrors Claude Code's agentic, tool-call-heavy loop, the model requirements are the same: tool-calling reliability above all.

Best pick

Qwen3.6 35B-A3B

Agent-tuned MoE with the strongest open-weight tool-calling here; 100K+ context, runs on a 24GB Mac.

What Open Claude Code needs

Reliable structured tool-calling plus a 32K+ context window. Open Claude Code drives the same 25-tool agentic loop as Claude Code, so weak tool-callers break it.

Open Claude Code Local LLM Tier List

SS: Best in class

Qwen3.6 35B-A3B35B· 24GB RAM

Agent-tuned MoE with the strongest open-weight tool-calling here; 100K+ context, runs on a 24GB Mac.

Qwen3.5 35B-A3B Instruct35B· 24GB RAM

Proven agentic MoE used in local Claude Code setups; disables internal CoT so the agent drives reasoning.

AA: Strong, reliable

Qwen2.5 Coder 14B14B· 22GB RAM

Trained for tool use; the most dependable mid-size coding workhorse for an OpenAI-compatible agent CLI.

Gemma 4 31B31B· 32GB RAM

Low tool-call error rates in agentic testing; run with thinking off.

Qwen3 30B30B· 28GB RAM

Capable agentic MoE; reliable tool calls and 256K context for repo-scale work.

BB: Usable with caveats

LFM2 24B-A2B Instruct24B· 16GB RAM

Liquid's tool-dispatch MoE, efficient on-device agent, fits ~14.5GB; lighter coding depth.

Qwen2.5 Coder 7B7B· 10GB RAM

Solid 8GB fallback with real tool-use training; weaker on long multi-step chains.

Mistral Small 22B22B· 26GB RAM

Decent function calling, fits ~32GB Macs, but long tool sequences can hit format errors.

CC: Works, but not recommended

DeepSeek-R1 Distill Qwen 14B14B· 22GB RAM

Strong reasoning, but R1 distills drop tool calls into content instead of the tool_calls array.

Phi-4 14B14B· 22GB RAM

Lacks proper tool-calling; fine as a chat model, unreliable as an agent.

Qwen3.5 4B Instruct4B· 8GB RAM

Agent-tuned but too small to stay coherent across the long agentic loop.

Tiers weigh tool-calling reliability, context window, and coding quality for Open Claude Code specifically. A model can rank higher for one tool than another. RAM figures are for Q4 quantization. Sources are listed below.

Local setup notes

Install with npx @ruvnet/open-claude-code or npm i -g @ruvnet/open-claude-code. Run a local OpenAI-compatible server (Ollama, llama.cpp, LM Studio), then set OPENAI_BASE_URL to it (e.g. http://localhost:11434/v1) with OPENAI_API_KEY set to any dummy value, and select your model. Raise the context window to 32K+ for the agentic loop.

Open Claude Code official site ↗

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

By subscribing you agree to our Privacy Policy and to receive the weekly email. Unsubscribe anytime.

Frequently Asked Questions

Can Open Claude Code run on local open-source models?+

Yes. It exposes a configurable OPENAI_BASE_URL, so you point it at a local OpenAI-compatible server (Ollama, llama.cpp, LM Studio) with OPENAI_API_KEY set to any dummy value. Then it runs fully on open-weight models with no cloud API.

How is Open Claude Code different from the official Claude Code?+

It is an open-source reimplementation built from the same architecture, with 25 tools, MCP transports, slash commands, and permission modes, but model-agnostic, with first-class environment variables to swap in OpenAI-compatible or local endpoints. The official client is closed and defaults to Anthropic's API.

Which local model is best for Open Claude Code?+

The agent-tuned Qwen3.5/3.6 35B-A3B MoE models. They emit reliable structured tool calls with 100K+ context and run on a 24GB Apple Silicon Mac. For smaller machines, Qwen2.5 Coder 14B is the dependable lighter pick.