Local LLM Refresh: A Coding Wave Hits Ollama (June 2026)

Three new 30B-class models landed on Ollama this fortnight, and all three share the same trick: a Mixture-of-Experts design with about 3 billion active parameters. That means frontier-style coding and reasoning at a memory footprint a 32GB Mac can hold. Cohere, Poolside, and NVIDIA each shipped one. Here is what fits your machine.

What shipped this week?

The headline is a coding surge. Two labs released open-weight agentic coding models within days of each other, and a strong NVIDIA reasoner reached the Ollama library. All run locally in Q4 to Q6 quantization.

Model	Maker	Total / active	Context	License	Focus
North Mini Code	Cohere	30B / 3B	256K	Apache 2.0	Agentic coding
Laguna XS.2	Poolside	33B / 3B	128K	Apache 2.0	Agentic coding
Nemotron Cascade 2	NVIDIA	30B / 3B	256K	Open weights	Reasoning, math

The Mixture-of-Experts pattern is the throughline. A 30B model with 3B active params loads its full weights into memory but only computes a fraction per token, so it runs much faster than a dense 30B model at the same quality tier.

Cohere North Mini Code: a 30B coding agent

North Mini Code is Cohere's first developer-focused model, released June 11, 2026 under Apache 2.0. It is a 30B Mixture-of-Experts with 3B active parameters, a 256K context window, and up to 64K output tokens, which is enough to reason across a whole repository.

Cohere reports a score of 33.4 on the Artificial Analysis Coding Index (Cohere model card). Independent measurements have questioned how it stacks against similarly sized rivals, so treat the ranking as the maker's own figure rather than a settled result.

Pull it with:

ollama run north-mini-code-1.0:q4_K_M

The Q4_K_M build is about 18.6GB on disk, so a 24GB machine is the practical floor.

Poolside Laguna XS.2: strong SWE-bench scores

Laguna XS.2 is Poolside AI's open-weight coding model (Apache 2.0). It is a 33B Mixture-of-Experts with 3B active parameters and a 128K context window, tuned for long-horizon software engineering.

Poolside reports the following on its release page (Poolside blog):

Benchmark	Score
SWE-bench Verified	68.2%
SWE-bench Multilingual	62.4%
SWE-bench Pro	44.5%
Terminal-Bench 2.0	30.1%

That SWE-bench Verified figure is competitive for a model in the 30B to 35B range, though it sits below the current open-weight leaders we track on our benchmark page. Pull it with:

ollama run laguna-xs.2:q4_K_M

The Q4_K_M build is roughly 23GB, so plan for a 32GB machine.

NVIDIA Nemotron Cascade 2: gold-medal reasoning

Nemotron Cascade 2 is not brand new. NVIDIA released it on March 19, 2026, but the Ollama tag arrived this month, which is why it shows up now. It is a 30B Mixture-of-Experts with 3B active parameters, a 256K context window, and both thinking and instruct modes.

NVIDIA's model card claims gold-medal performance at both the 2025 International Mathematical Olympiad and the International Olympiad in Informatics, with an IOI score of 439.3 (NVIDIA model card). Unlike the two coding specialists above, this one is a general reasoner you can use for chat, math, and agentic work.

ollama run nemotron-cascade-2:30b

The :30b tag ships around 24GB (a Q6_K build), so a 32GB Mac is the target.

Which one should you run?

Pick by job, not by score:

General work, math, agents: Nemotron Cascade 2. It is the only one of the three built as a general-purpose model.
Coding on a 24GB machine: North Mini Code. Its Q4 build is the smallest of the three.
Coding on a 32GB machine: Laguna XS.2, if its SWE-bench profile matches your stack.

If you are unsure what your Mac can hold, the ModelFit wizard picks the single best model for your exact RAM and chip. None of these three changed our leaderboard, so the existing top picks for chat still stand.

Did the leaderboard move?

No. The headline metric we track is SWE-bench Verified, and the best open-weight scores we list did not change this week. Laguna XS.2 at 68.2% is strong but sits under the current open-weight front. We add new models to the catalog as they verify, and we move the leaderboard only when a source-confirmed score beats the current best.

FAQ

Can I run a 30B model on a 16GB Mac?

Not these three. A 30B Mixture-of-Experts model loads all of its weights into memory, so even the smallest Q4 build here needs about 19GB on disk and a 24GB machine in practice. On 16GB, stick to 8B to 14B models. Estimated throughput figures on our device pages are estimates, not measured results.

What does "3B active parameters" mean?

A Mixture-of-Experts model splits its weights into many experts and routes each token to only a few of them. North Mini Code, Laguna XS.2, and Nemotron Cascade 2 each hold 30B to 33B total weights in memory but compute with about 3B per token. The result is the speed of a small model with the knowledge of a larger one.

Are these models free to use commercially?

North Mini Code and Laguna XS.2 are released under Apache 2.0, which permits commercial use. Nemotron Cascade 2 ships under NVIDIA's open-weights terms, so check its license before commercial deployment.

Why is Nemotron Cascade 2 listed if it shipped in March?

The model is from March 2026, but its Ollama tag went live this month, which is when it became a one-command local download. We list models when they become practical to run locally, not only on the day weights first appear.

Where do these benchmark numbers come from?

Every figure here is quoted from the maker's own model card or release page, linked inline. ModelFit does not run its own benchmarks. We verify that each Ollama tag resolves before listing a model, and we quote third-party scores with attribution rather than presenting them as our own.