$_CLI agent· 11 models ranked

Best Local LLMs for OpenCode

Open-source terminal coding agent (Ollama / OpenAI-compatible)

OpenCode is a terminal AI coding agent that reads, writes, and runs code through an agentic loop, fully offline via Ollama. Because every action is a tool call, a model that describes edits instead of firing the read/write/bash tools is useless here regardless of raw coding skill. Hands-on testing separates the models cleanly, and the right context-window setting matters as much as the model.

Best pick

Qwen3.5 27B Instruct

The tested OpenCode winner: complete working projects, lowest structured-task error rate, precise tool format.

What OpenCode needs

Rock-solid tool-call emission, with the context window raised well above Ollama's 4K default (16K minimum, 64K recommended).

OpenCode Local LLM Tier List

SS: Best in class

Qwen3.5 27B Instruct27B· 20GB RAM

The tested OpenCode winner: complete working projects, lowest structured-task error rate, precise tool format.

Gemma 4 26B-A4B26B· 24GB RAM

Close runner-up: clean tool calls and compiling output from an efficient MoE.

AA: Strong, reliable

Qwen3 30B30B· 28GB RAM

Qwen3-Coder-class MoE; fast, clean first-attempt code with reliable tool calls.

Qwen3.6 35B-A3B35B· 24GB RAM

Newest Qwen3.6 coder MoE, with strong agentic coding and tool-calling lineage.

Qwen2.5 Coder 14B14B· 22GB RAM

Proven coder with solid native tool-calling; dependable daily driver.

BB: Usable with caveats

Qwen3.5 35B-A3B Instruct35B· 24GB RAM

Capable but unpredictable on structured output (slug/edit hallucination); add a validation step.

Qwen2.5 Coder 7B7B· 10GB RAM

Smallest model that still calls tools reliably enough for light edits at 16K+ ctx.

Mistral Small 22B22B· 26GB RAM

Decent native tool calling; workable but weaker agentic coding than Qwen peers.

CC: Works, but not recommended

Qwen3 14B14B· 20GB RAM

Failure mode OpenCode hates: hallucinates results instead of using tools.

DeepSeek-R1 Distill Qwen 14B14B· 22GB RAM

Verbose reasoning traces disrupt the tool-call loop.

Phi-4 14B14B· 22GB RAM

Weak, unreliable tool calling for agentic editing.

Tiers weigh tool-calling reliability, context window, and coding quality for OpenCode specifically. A model can rank higher for one tool than another. RAM figures are for Q4 quantization. Sources are listed below.

Local setup notes

Install OpenCode, run Ollama, and select your local model. Critically, raise num_ctx. Ollama defaults every model to a 4K window, which silently breaks agentic tool use; set 16K at minimum and 64K for serious work.

OpenCode official site ↗

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

By subscribing you agree to our Privacy Policy and to receive the weekly email. Unsubscribe anytime.

Frequently Asked Questions

Which local model is best for OpenCode?+

Qwen 3.5 27B. In hands-on OpenCode testing it delivered complete working projects with the lowest structured-task error rate and the most reliable tool calls. Gemma 4 26B is the closest alternative.

Why do some local models fail in OpenCode?+

They generate text describing edits or summaries instead of emitting actual tool calls, or they hallucinate results rather than calling read or bash. Smaller and reasoning-distill models showed exactly this in testing.

What context window does OpenCode need with Ollama?+

At least 16K tokens for agentic tools to work, with 64K recommended. Ollama defaults every model to 4K, so you must raise num_ctx or tool use silently breaks.