Best Local LLM for Coding (2026): Ranked by RAM

Qwen3.6 27B is the best local coding model for most machines in 2026 — a 17GB download whose model card reports 77.2 on SWE-bench Verified. Here is the full tier list by RAM, every command registry-verified, plus working setups for continue.dev, aider and Zed.

By ModelFit Team · Published 2026-06-12

TL;DR

32GB machine: run qwen3.6:27b (77.2 SWE-bench Verified per Qwen's model card) or devstral-small-2:24b (68.0% per Mistral's card). 16GB: qwen2.5-coder:7b for chat + qwen2.5-coder:1.5b for autocomplete. 128GB: devstral-2:123b (72.2% SWE-bench Verified). All commands verified against the Ollama registry on 2026-06-12.

Contents

Which Local LLM Is Best for Coding? The Tier List

Your RAM (or VRAM) is the hard ceiling, so the ranking below is organized by what actually fits. Benchmark figures shown are the ones published in each vendor's own model card and confirmed in the raw source; where no confirmed figure exists, we say so instead of inventing one. Every ollama run command was probed against the Ollama registry (HTTP 200) on 2026-06-12.

ModelRAM tierDownloadContextSourced score
Qwen3.6 27B32GB17GB256K77.2 SWE-bench Verified
Devstral Small 2 (24B)24-32GB15GB384K68.0% SWE-bench Verified
Qwen3-Coder 30B (MoE)32GB19GB256K
gpt-oss-20b16-24GB14GB128K
Qwen2.5-Coder 7B8-16GB4.7GB32K
Qwen2.5-Coder 1.5Bany 8GB+986MB32K
gpt-oss-120b96-128GB65GB128K41.8% aider polyglot
Devstral 2 (123B)128GB75GB256K72.2% SWE-bench Verified

Qwen3.6 27B

32GB

The headline local coder of 2026 — frontier-class agentic coding score in a download that fits a 32GB Mac or 24GB GPU.

ollama run qwen3.6:27b

Devstral Small 2 (24B)

24-32GB

Mistral’s agentic-coding specialist. Slightly smaller than Qwen3.6 27B with the largest context window at this tier.

ollama run devstral-small-2:24b

Qwen3-Coder 30B (MoE)

32GB

Mixture-of-experts with only 3.3B active parameters — noticeably faster per token than dense 27B models on the same hardware.

ollama run qwen3-coder:30b

gpt-oss-20b

16-24GB

OpenAI’s small open-weight MoE — the strongest coding option below the 32GB tier.

ollama run gpt-oss:20b

Qwen2.5-Coder 7B

8-16GB

The 16GB-and-under workhorse for chat-style coding help. Pair it with the 1.5B variant for autocomplete.

ollama run qwen2.5-coder:7b

Qwen2.5-Coder 1.5B

any 8GB+

continue.dev’s officially recommended autocomplete model — small enough to return inline completions instantly.

ollama run qwen2.5-coder:1.5b

gpt-oss-120b

96-128GB

For Mac Studio / 128GB-class machines. Scores 41.8% on the aider polyglot benchmark — above any 32B-class open model on that board.

ollama run gpt-oss:120b

Devstral 2 (123B)

128GB

The strongest open-weight coder that runs on a single high-memory machine.

ollama run devstral-2:123b

Sources: Qwen3.6-27B model card (77.2 SWE-bench Verified, 59.3 Terminal-Bench 2.0), Devstral Small 2 model card (68.0% SWE-bench Verified; Devstral 2 123B at 72.2%), aider polyglot leaderboard (gpt-oss-120b 41.8%). SWE-bench figures are vendor self-reported on their model cards; aider scores are independently run.

What Changed in 2026: the Local Coding Gap Collapsed

A year ago the standard answer was Qwen2.5-Coder 32B — which scores just 16.4% on the aider polyglot benchmark. The 2026 generation rewrote that completely: Devstral Small 2 posts a 68.0% SWE-bench Verified score in a 15GB download, and Qwen3.6 27B reports 77.2 — numbers that were frontier-cloud territory in 2025. If your "best local coding model" knowledge dates from the Qwen2.5-Coder era, every recommendation you remember is out of date.

Two caveats keep cloud models ahead for some work: very long agentic sessions (local context windows fill up faster in practice) and the biggest open-weight coders (DeepSeek, Kimi K2) are 600B-1T-parameter models that no consumer machine runs — open-weight is not the same as locally runnable. For a deeper comparison, see our local vs cloud flagship benchmark breakdown.

How Do I Use a Local LLM in VS Code? (continue.dev)

continue.dev is the most popular open-source Copilot alternative for VS Code and JetBrains. Configuration now lives in ~/.continue/config.yaml (the old config.json is deprecated). A complete local setup — chat on a strong model, autocomplete on a fast one:

models:
  - name: Qwen3.6 27B
    provider: ollama
    model: qwen3.6:27b

  - name: Autocomplete
    provider: ollama
    model: qwen2.5-coder:1.5b
    roles:
      - autocomplete

Qwen2.5-Coder 1.5B is continue.dev's officially recommended autocomplete model. Their docs also warn against using thinking/reasoning models for autocomplete — completions need to arrive in milliseconds, not after a reasoning chain.

How Do I Use aider with Ollama?

aider is the terminal-native AI pair programmer. Point it at your local Ollama server:

export OLLAMA_API_BASE=http://127.0.0.1:11434
aider --model ollama_chat/qwen3.6:27b

Use the ollama_chat/ prefix — aider's docs state it is recommended over plain ollama/. The critical gotcha: Ollama defaults to a 2K context window and silently discards anything beyond it (aider's own warning). Coding needs far more. Raise it server-side:

OLLAMA_CONTEXT_LENGTH=32768 ollama serve

How Do I Use a Local LLM in Zed?

Zed auto-discovers every model Ollama has pulled — pull a model, then pick it from the Agent panel's model dropdown. One thing worth overriding: Zed defaults local models to a 4096-token context. In settings.json:

{
  "language_models": {
    "ollama": {
      "api_url": "http://localhost:11434",
      "available_models": [
        {
          "name": "qwen3.6:27b",
          "display_name": "Qwen3.6 27B",
          "max_tokens": 32768,
          "supports_tools": true
        }
      ]
    }
  }
}

Frequently Asked Questions

What is the best local LLM for coding in 2026?

Qwen3.6 27B for most machines — 77.2 SWE-bench Verified (per its model card) in a 17GB download that fits a 32GB Mac or 24GB GPU. Devstral Small 2 is the strongest agentic-coding alternative at the same tier, with a 384K context window.

How much RAM do I need to run a coding LLM locally?

16GB runs 7B-class models plus a fast autocomplete model. 32GB is the sweet spot for the current best local coders (Qwen3.6 27B, Devstral Small 2). 96-128GB unlocks gpt-oss-120b and Devstral 2 123B.

What is the best local model for code autocomplete?

Qwen2.5-Coder 1.5B — continue.dev's official recommendation. Under 1GB, fast enough for inline suggestions. Never use a thinking model for autocomplete.

Can a local LLM replace GitHub Copilot or Claude?

For autocomplete and single-file edits, yes — fully offline. For long agentic sessions on large repos, frontier cloud models still lead, but 2026's local 24-27B models closed most of the gap.

Machine Too Small for Devstral 2? Rent a Cloud GPU

by the hour

The 123B-class open coders need ~128GB of memory. An hourly rented GPU runs them with the same Ollama workflow — no hardware purchase, billed by the hour.

RunPod: Hourly GPU pods (RTX 4090 to H100) with one-click Ollama/vLLM templates.

Vast.ai: Marketplace of rented GPUs — usually the cheapest per-hour prices.

ModelFit may earn a commission on sign-ups made through these links, at no extra cost to you.

Related Guides

Not sure what fits your machine?
Run the wizard — it ranks coding models for your exact hardware.
Open the wizard
modelfit.io