~>CLI agent· 13 models ranked

Best Local LLMs for Goose

Block's open-source autonomous developer agent (MCP-driven)

Goose (by Block) is an open-source autonomous developer agent (CLI and desktop) that executes, edits, and tests code using any LLM. It runs fully local via Ollama and is built around MCP extensions, so it leans almost entirely on the model's tool-calling ability. Block states it plainly: models without tool calling can only do chat completion, and every extension must be disabled. Choose accordingly.

Best pick

Qwen2.5 Coder 14B

Qwen2.5 is Goose's explicitly recommended local family; 14B is large enough to call tools consistently.

What Goose needs

Reliable native JSON tool/function calling plus enough context (8K+). Without it, Goose can only chat, because every extension and action is a tool call.

Goose Local LLM Tier List

SS: Best in class

Qwen2.5 Coder 14B14B· 22GB RAM

Qwen2.5 is Goose's explicitly recommended local family; 14B is large enough to call tools consistently.

Qwen3.5 35B-A3B Instruct35B· 24GB RAM

Large Qwen MoE, strong native tool calling with headroom for many MCP tools.

Llama 3.3 70B Instruct70B· 48GB RAM

Recommended Llama family; 70B is the most reliable local tier for multi-tool agent loops.

AA: Strong, reliable

Qwen3 30B30B· 28GB RAM

Large Qwen3 MoE with native tool calling; a popular Goose-local choice.

Qwen3.5 27B Instruct27B· 20GB RAM

Dense large Qwen; reliable tool calls.

Qwen2.5 Coder 7B7B· 10GB RAM

Recommended Qwen coder; works but smaller, so slightly less consistent on the 11-tool default set.

Mistral Small 22B22B· 26GB RAM

Goose-recommended family with native function calling; solid mid-size agent model.

BB: Usable with caveats

Qwen3.5 9B Instruct9B· 14GB RAM

Tool calling present, but ~9B struggles to stay consistent across the full extension set.

Qwen3 8B8B· 12GB RAM

Usable for few-tool setups; flaky once many tools are enabled.

Qwen2.5 7B Instruct7B· 10GB RAM

Recommended family but small, with reported tool-calling failures on M-series in community threads.

CC: Works, but not recommended

Phi-4 14B14B· 22GB RAM

Flagged as lacking native tool calling in Goose discussions.

Gemma 4 31B31B· 32GB RAM

Gemma family reported with no native tool calling for Goose; chat only.

DeepSeek-R1 Distill Qwen 14B14B· 22GB RAM

Native R1 does not support tool calling; the 14B distill is too small to compensate.

Tiers weigh tool-calling reliability, context window, and coding quality for Goose specifically. A model can rank higher for one tool than another. RAM figures are for Q4 quantization. Sources are listed below.

Local setup notes

Configure Goose with the Ollama provider. Raise OLLAMA_CONTEXT_LENGTH to 8K+ so the model can hold Goose's MCP tool definitions (the 4K default is too small and silently drops extensions and .goosehints). Stick to native-tool-calling families.

Goose official site ↗

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

By subscribing you agree to our Privacy Policy and to receive the weekly email. Unsubscribe anytime.

Frequently Asked Questions

What is the best local model for Goose?+

A Qwen2.5/Qwen3 model or a Llama 70B. Block recommends Qwen2.5, Llama, and Mistral variants because they support native tool calling, which Goose relies on for every action.

Why does Goose ignore my extensions or .goosehints with local models?+

The model's context is usually too small. Goose's default 4096-token context cannot hold the MCP tool definitions; raise OLLAMA_CONTEXT_LENGTH to 8K+ so the model can see them.

Can I use Gemma, Phi-4, or DeepSeek-R1 with Goose?+

Not well. These lack native tool calling, so Goose can only chat with them and extensions must be disabled. Only Block's custom deepseek-r1-goose 70B build adds tool support.