->IDE extension· 14 models ranked

Best Local LLMs for Continue.dev

Open-source chat, edit & autocomplete for VS Code / JetBrains

Continue.dev is an open-source AI code assistant for VS Code and JetBrains that runs chat, edit, and tab-autocomplete fully locally via Ollama. It is less agentic than Cline or Roo, mostly chat plus edit plus autocomplete, so tool-calling matters little and a wide range of instruct models work well. The catch: autocomplete needs a separate fill-in-the-middle model, not the same one you chat with.

Best pick

Qwen2.5 Coder 14B

Continue's code-specialist sweet spot; the top locally-runnable chat/edit pick that fits common RAM.

What Continue.dev needs

Strong coding-instruct quality at a size that fits in RAM. Continue's chat/edit roles tolerate non-tool-calling models, so raw code reasoning beats agentic polish.

Continue.dev Local LLM Tier List

SS: Best in class

Qwen2.5 Coder 14B14B· 22GB RAM

Continue's code-specialist sweet spot; the top locally-runnable chat/edit pick that fits common RAM.

Qwen3.5 35B-A3B Instruct35B· 24GB RAM

Newest large Qwen MoE, desktop-class speed with big-model quality for chat/edit.

AA: Strong, reliable

Qwen2.5 Coder 7B7B· 10GB RAM

Continue's explicitly recommended local chat + code-gen model; light enough for 16GB.

Qwen3 30B30B· 28GB RAM

Named in Continue docs as a recommended Ollama chat model.

Qwen3.5 27B Instruct27B· 20GB RAM

Dense high-quality Qwen for strong chat/edit when RAM allows.

Mistral Small 22B22B· 26GB RAM

Recommended by Continue for fast, versatile chat; reliable edit behavior.

Llama 3.3 70B Instruct70B· 48GB RAM

Top-tier reasoning/edit quality for high-RAM Macs; strongest Llama for chat.

BB: Usable with caveats

Qwen3.5 9B Instruct9B· 14GB RAM

Solid mid-size dense chat/edit at lower RAM cost.

Gemma 4 31B31B· 32GB RAM

Strong newer Gemma for chat/edit when RAM allows.

Phi-4 14B14B· 22GB RAM

Strong reasoning-for-size; good chat/edit on modest hardware.

DeepSeek-R1 Distill Qwen 14B14B· 22GB RAM

Useful for debugging/refactor chat, but thinking models are slower and not for autocomplete.

CC: Works, but not recommended

Qwen2.5 14B Instruct14B· 20GB RAM

General (non-coder) Qwen2.5; the coder-14b is the better pick for code roles.

Gemma 2 9B Instruct9B· 14GB RAM

Older Gemma; weak for coding, generic chat only.

DeepSeek-R1 Distill Llama 70B70B· 48GB RAM

Heavy reasoning distill; slow and overkill for Continue's mostly non-agentic roles.

Tiers weigh tool-calling reliability, context window, and coding quality for Continue.dev specifically. A model can rank higher for one tool than another. RAM figures are for Q4 quantization. Sources are listed below.

Local setup notes

Continue assigns models to roles. Use an instruct coder (7B-32B) for chat and edit, and pair it with a small FIM-trained model for tab-autocomplete. Continue's validated local stack is qwen2.5-coder:7b for chat plus qwen2.5-coder:1.5b for autocomplete.

Continue.dev official site ↗

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

By subscribing you agree to our Privacy Policy and to receive the weekly email. Unsubscribe anytime.

Frequently Asked Questions

Which local model is best for Continue.dev?+

For most setups, a Qwen2.5-Coder model: the 7B or 14B coder handles chat and edits, paired with a tiny 1.5B coder for tab-autocomplete. Continue's docs name these as its primary local picks.

Can I use one model for both chat and autocomplete?+

Not ideally. Autocomplete needs a fill-in-the-middle (FIM) model, while chat and edit use instruct models. Continue lets you assign separate models per role, so most users run a small coder for autocomplete and a larger one for chat.

Do reasoning models like DeepSeek-R1 work in Continue.dev?+

Yes for chat, debugging, and refactoring, but Continue warns thinking models are slower and explicitly unsuitable for autocomplete, where speed matters most.