Best Local LLM for Coding (2026): Ranked by RAM

Q: What is the best local LLM for coding in 2026?

Qwen3.6 27B is the best local coding model for most setups in 2026. Its model card reports 77.2 on SWE-bench Verified in a 17GB download that fits a 32GB Mac or 24GB GPU. Devstral Small 2 (24B, 68.0% SWE-bench Verified per Mistral) is the strongest agentic-coding alternative at the same tier.

Q: How much RAM do I need to run a coding LLM locally?

16GB runs 7B-class coding models (Qwen2.5-Coder 7B) plus a fast autocomplete model. 32GB is the sweet spot: it fits Qwen3.6 27B or Devstral Small 2, the current best local coders. 128GB unified memory unlocks Devstral 2 123B and gpt-oss-120b.

Q: What is the best local model for code autocomplete?

Qwen2.5-Coder 1.5B is the autocomplete model officially recommended by continue.dev. At under 1GB it returns completions fast enough for inline suggestions. Do not use reasoning/thinking models for autocomplete; they are too slow.

Q: Can a local LLM replace GitHub Copilot or Claude?

For autocomplete and single-file edits, yes. continue.dev or Zed with a local Qwen model covers most of what Copilot does, fully offline. For long agentic sessions across a large repo, frontier cloud models still lead, but the gap narrowed sharply in 2026: top local 24-27B models now post SWE-bench Verified scores in the high 60s to high 70s.

Which Local LLM Is Best for Coding? The Tier List

Your RAM (or VRAM) is the hard ceiling, so the ranking below is organized by what actually fits. Benchmark figures shown are the ones published in each vendor's own model card and confirmed in the raw source; where no confirmed figure exists, we say so instead of inventing one. Every ollama run command was probed against the Ollama registry (HTTP 200) on 2026-06-12.

Model	RAM tier	Download	Context	Sourced score
Qwen3.6 27B	32GB	17GB	256K	77.2 SWE-bench Verified
Devstral Small 2 (24B)	24-32GB	15GB	384K	68.0% SWE-bench Verified
Qwen3-Coder 30B (MoE)	32GB	19GB	256K	—
gpt-oss-20b	16-24GB	14GB	128K	—
Qwen2.5-Coder 7B	8-16GB	4.7GB	32K	—
Qwen2.5-Coder 1.5B	any 8GB+	986MB	32K	—
gpt-oss-120b	96-128GB	65GB	128K	41.8% aider polyglot
Devstral 2 (123B)	128GB	75GB	256K	72.2% SWE-bench Verified

Qwen3.6 27B

32GB

The headline local coder of 2026, with a frontier-class agentic coding score in a download that fits a 32GB Mac or 24GB GPU.

ollama run qwen3.6:27b

Devstral Small 2 (24B)

24-32GB

Mistral’s agentic-coding specialist. Slightly smaller than Qwen3.6 27B with the largest context window at this tier.

ollama run devstral-small-2:24b

Qwen3-Coder 30B (MoE)

32GB

Mixture-of-experts with only 3.3B active parameters, noticeably faster per token than dense 27B models on the same hardware.

ollama run qwen3-coder:30b

gpt-oss-20b

16-24GB

OpenAI's small open-weight MoE, the strongest coding option below the 32GB tier.

ollama run gpt-oss:20b

Qwen2.5-Coder 7B

8-16GB

The 16GB-and-under workhorse for chat-style coding help. Pair it with the 1.5B variant for autocomplete.

ollama run qwen2.5-coder:7b

Qwen2.5-Coder 1.5B

any 8GB+

continue.dev's officially recommended autocomplete model, small enough to return inline completions instantly.

ollama run qwen2.5-coder:1.5b

gpt-oss-120b

96-128GB

For Mac Studio / 128GB-class machines. Scores 41.8% on the aider polyglot benchmark, above any 32B-class open model on that board.

ollama run gpt-oss:120b

Devstral 2 (123B)

128GB

The strongest open-weight coder that runs on a single high-memory machine.

ollama run devstral-2:123b

Sources: Qwen3.6-27B model card (77.2 SWE-bench Verified, 59.3 Terminal-Bench 2.0), Devstral Small 2 model card (68.0% SWE-bench Verified; Devstral 2 123B at 72.2%), aider polyglot leaderboard (gpt-oss-120b 41.8%). SWE-bench figures are vendor self-reported on their model cards; aider scores are independently run.

What Changed in 2026: the Local Coding Gap Collapsed

A year ago the standard answer was Qwen2.5-Coder 32B, which scores just 16.4% on the aider polyglot benchmark. The 2026 generation rewrote that completely: Devstral Small 2 posts a 68.0% SWE-bench Verified score in a 15GB download, and Qwen3.6 27B reports 77.2, numbers that were frontier-cloud territory in 2025. If your "best local coding model" knowledge dates from the Qwen2.5-Coder era, every recommendation you remember is out of date.

Two caveats keep cloud models ahead for some work: very long agentic sessions (local context windows fill up faster in practice) and the biggest open-weight coders (DeepSeek, Kimi K2) are 600B-1T-parameter models that no consumer machine runs. Open-weight is not the same as locally runnable. For a deeper comparison, see our local vs cloud flagship benchmark breakdown.

How Do I Use a Local LLM in VS Code? (continue.dev)

continue.dev is the most popular open-source Copilot alternative for VS Code and JetBrains. Configuration now lives in ~/.continue/config.yaml (the old config.json is deprecated). A complete local setup: chat on a strong model, autocomplete on a fast one:

models:
  - name: Qwen3.6 27B
    provider: ollama
    model: qwen3.6:27b

  - name: Autocomplete
    provider: ollama
    model: qwen2.5-coder:1.5b
    roles:
      - autocomplete

Qwen2.5-Coder 1.5B is continue.dev's officially recommended autocomplete model. Their docs also warn against using thinking/reasoning models for autocomplete. Completions need to arrive in milliseconds, not after a reasoning chain.

How Do I Use aider with Ollama?

aider is the terminal-native AI pair programmer. Point it at your local Ollama server:

export OLLAMA_API_BASE=http://127.0.0.1:11434
aider --model ollama_chat/qwen3.6:27b

Use the ollama_chat/ prefix; aider's docs state it is recommended over plain ollama/. The critical gotcha: Ollama defaults to a 2K context window and silently discards anything beyond it (aider's own warning). Coding needs far more. Raise it server-side:

OLLAMA_CONTEXT_LENGTH=32768 ollama serve

How Do I Use a Local LLM in Zed?

Zed auto-discovers every model Ollama has pulled. Pull a model, then pick it from the Agent panel's model dropdown. One thing worth overriding: Zed defaults local models to a 4096-token context. In settings.json:

{
  "language_models": {
    "ollama": {
      "api_url": "http://localhost:11434",
      "available_models": [
        {
          "name": "qwen3.6:27b",
          "display_name": "Qwen3.6 27B",
          "max_tokens": 32768,
          "supports_tools": true
        }
      ]
    }
  }
}

Frequently Asked Questions

What is the best local LLM for coding in 2026?

Qwen3.6 27B for most machines: 77.2 SWE-bench Verified (per its model card) in a 17GB download that fits a 32GB Mac or 24GB GPU. Devstral Small 2 is the strongest agentic-coding alternative at the same tier, with a 384K context window.

How much RAM do I need to run a coding LLM locally?

16GB runs 7B-class models plus a fast autocomplete model. 32GB is the sweet spot for the current best local coders (Qwen3.6 27B, Devstral Small 2). 96-128GB unlocks gpt-oss-120b and Devstral 2 123B.

What is the best local model for code autocomplete?

Qwen2.5-Coder 1.5B, continue.dev's official recommendation. Under 1GB, fast enough for inline suggestions. Never use a thinking model for autocomplete.

Can a local LLM replace GitHub Copilot or Claude?

For autocomplete and single-file edits, yes, fully offline. For long agentic sessions on large repos, frontier cloud models still lead, but 2026's local 24-27B models closed most of the gap.

Best Local LLM for Coding (2026): Ranked by RAM

TL;DR

Contents

Which Local LLM Is Best for Coding? The Tier List

Qwen3.6 27B

Devstral Small 2 (24B)

Qwen3-Coder 30B (MoE)

gpt-oss-20b

Qwen2.5-Coder 7B

Qwen2.5-Coder 1.5B

gpt-oss-120b

Devstral 2 (123B)

What Changed in 2026: the Local Coding Gap Collapsed

How Do I Use a Local LLM in VS Code? (continue.dev)

How Do I Use aider with Ollama?

How Do I Use a Local LLM in Zed?

Frequently Asked Questions

Machine Too Small for Devstral 2? Rent a Cloud GPU

Related Guides

Best Local LLM for Coding (2026): Ranked by RAM

TL;DR

Contents

Which Local LLM Is Best for Coding? The Tier List

Qwen3.6 27B

Devstral Small 2 (24B)

Qwen3-Coder 30B (MoE)

gpt-oss-20b

Qwen2.5-Coder 7B

Qwen2.5-Coder 1.5B

gpt-oss-120b

Devstral 2 (123B)

What Changed in 2026: the Local Coding Gap Collapsed

How Do I Use a Local LLM in VS Code? (continue.dev)

How Do I Use aider with Ollama?

How Do I Use a Local LLM in Zed?

Frequently Asked Questions

Machine Too Small for Devstral 2? Rent a Cloud GPU

The weekly local-AI refresh

Related Guides