How Much RAM Do You Need for a Local LLM?

At Q4_K_M, a local LLM needs ~0.6 GB of memory per billion parameters. 8GB runs up to ~7B, 16GB up to ~14B, 24GB up to ~27B, 32GB up to ~32B, and 64GB up to ~70B. The full model-size-to-memory matrix is below; figures are ModelFit estimates, not measured benchmarks.

By ModelFit Team · Updated 2026-07-25

Quick answer

A local LLM needs roughly 0.6 GB of RAM per billion parameters at Q4_K_M quantization. Add ~30% headroom for the OS, app windows, and context. So an 8 GB device runs up to ~7B models, 16 GB up to ~14B, 24 GB up to ~27B, 32 GB up to ~32B, and 64 GB up to ~70B. ModelFit tracks 75 local models across 21 families against these tiers.

Free to cite with attribution to ModelFit (modelfit.io). Sizing is a first-party estimate from quantization math, not a measured benchmark.

8 GB

~8.3B max

16 GB

~14B max

24 GB

~27B max

32 GB

~35B max

RAM Prices Exploded in 2026: Size Before You Buy

The AI data-center buildout is soaking up the world's memory supply. TrendForce reported PC DRAM contract prices rising by over 100% quarter over quarter in early 2026, with mobile LPDDR5X up around 90%. Micron's CEO expects tight conditions to persist beyond calendar 2027. Apple raised Mac and iPad prices in June 2026 on the back of it: the 14-inch MacBook Pro moved from $1,699 to $1,999, and the Mac Mini's entry price rose from $599 to $799 after Tim Cook noted customers are "snapping up Mac minis and Mac Studios to run artificial intelligence".

What that means for you: memory is now the expensive part of a local AI machine, and Apple unified memory cannot be upgraded later. Use the matrix below to buy exactly the tier your target model needs. If you plan to run 14B models, 16 GB covers it at Q4; paying shortage prices for 64 GB you will never load is the costliest mistake. If you genuinely need 70B-class models, buy that tier once and keep it, since supply relief is not expected before late 2027. The hardware calculator sizes it for your exact RAM or GPU.

Model Size to RAM Matrix (Q4 and Q8)

How much memory each model size loads at Q4_K_M (the common default) and Q8 (higher quality), plus the smallest unified-memory tier that fits it with headroom. The same per-parameter math applies to GPU VRAM.

Parameters	Q4_K_M Size	Q8 Size	Min Unified RAM (Q4)
1B	~0.6 GB	~1.1 GB	8 GB
3B	~1.8 GB	~3.2 GB	8 GB
7B	~4.2 GB	~7.4 GB	8 GB
8B	~4.8 GB	~8.5 GB	8 GB
9B	~5.4 GB	~9.5 GB	8 GB
13B	~7.8 GB	~13.8 GB	16 GB
14B	~8.4 GB	~14.8 GB	16 GB
27B	~16.2 GB	~28.6 GB	24 GB
32B	~19.2 GB	~33.9 GB	32 GB
70B	~42 GB	~74.2 GB	64 GB
122B	~73.2 GB	~129.3 GB	96 GB

Estimates from quantization math (~0.6 GB/B at Q4, ~1.06 GB/B at Q8) and ModelFit's memory budget (~70% of unified memory up to 32GB, scaling to ~85% at 128GB+). Real usage varies with context length and the model's architecture. On high-RAM Macs you can raise the GPU-wired memory ceiling further withiogpu.wired_limit_mb to run even larger models.

What Does Each RAM Tier Run in ModelFit's Catalog?

Drawn straight from ModelFit's 75-model local catalog: the largest model each tier fits, how many models qualify, and the highest-quality pick. See the full breakdown on the hardware stats page.

Device RAM	Model Budget	Max Params	Models That Fit	Top-Quality Pick
8 GB	~5.6 GB	~8.3B	23	LFM2.5 8B-A1B
12 GB	~8.4 GB	~12B	29	Gemma 4 12B
16 GB	~11.2 GB	~14B	37	Qwen3.5 9B Instruct (Q8)
24 GB	~16.8 GB	~27B	45	Gemma 4 12B (Q8)
32 GB	~22.4 GB	~35B	54	Qwen3.6 35B-A3B
36 GB	~25.4 GB	~35B	56	Qwen3.6 35B-A3B
48 GB	~34.8 GB	~46.7B	59	Qwen3.6 27B (Q8)
64 GB	~48 GB	~70B	64	Qwen3.6 35B-A3B (Q8)
72 GB	~54.9 GB	~80B	65	Qwen3.6 35B-A3B (Q8)
96 GB	~76.8 GB	~122B	70	Qwen3.5 122B-A10B Instruct
128 GB	~108.8 GB	~122B	71	Qwen3.5 122B-A10B Instruct
192 GB	~163.2 GB	~235B	72	Qwen3 235B A22B
256 GB	~217.6 GB	~235B	72	Qwen3 235B A22B
512 GB	~435.2 GB	~671B	75	Qwen3 235B A22B

Can 8 GB Run a Local LLM?

Yes, an 8 GB device runs 3B-7B models comfortably (up to ~8.3B; 23 models fit). The catch is headroom: a 7B model loads ~4.2 GB, leaving little for macOS and your browser. Close other apps during inference, or pick a 3B-4B model for a smoother experience. The base MacBook Air, Mac Mini, and most iPhones live here; see MacBook Air picks.

What Fits in 16 GB?

16 GB is the sweet spot for local AI: it comfortably runs 7B-9B models (up to ~14B; 37 models fit) with room for context and other apps. A 14B model fits but leaves little headroom on a 16 GB machine. For the jump to 14B-27B models, step up to 24-32 GB; compare the tiers on the 16 GB vs 32 GB breakdown.

How Do You Size a Model to Your RAM?

Estimate the load. Parameter count × ~0.6 GB for Q4_K_M. A 7B model ≈ 4.2 GB; a 14B ≈ 8.4 GB.
Add headroom. Reserve ~30% of your memory for macOS, apps, and the KV-cache.
Match your tier. Use the matrix above, or run the ModelFit wizard for your exact chip and RAM.
Pull it. Install Ollama (setup guide) and run the one-command pull for your pick.

Frequently Asked Questions

How much RAM do I need to run a local LLM?

At Q4_K_M quantization a local LLM needs roughly 0.6 GB of memory per billion parameters, plus headroom for the OS and context. In practice, 8 GB runs up to ~7B models, 16 GB up to ~14B, 24 GB up to ~27B, 32 GB up to ~32B, and 64 GB up to ~70B. ModelFit tracks 75 local models across these tiers.

Can 8 GB of RAM run a local LLM?

Yes. An 8 GB device runs local models up to ~8.3B parameters at Q4; 23 of ModelFit's 75 local models fit its ~5.6 GB budget. The top-quality fit is LFM2.5 8B-A1B. Stick to 3B-7B models and close other apps for the smoothest experience.

What size LLM fits in 16 GB of RAM?

A 16 GB device comfortably runs local models up to ~14B parameters at Q4, with 37 of ModelFit's 75 local models fitting its ~11.2 GB budget. A strong pick is Qwen3.5 9B Instruct (Q8) (~10.7 GB loaded). 16 GB is the sweet spot for 7B-9B models.

Does quantization change how much RAM I need?

Yes. Q4_K_M (the common default) needs ~0.6 GB per billion parameters. Q8 roughly doubles that to ~1.06 GB/B for higher quality, and full FP16 needs ~2 GB/B. Lower quantization saves memory at a small quality cost. Q4_K_M is the best balance for most local setups.

Is unified memory (Mac) different from VRAM (GPU) for LLMs?

For sizing, the per-parameter rule is the same. The difference is the pool: Apple Silicon shares all unified memory between CPU and GPU, so a 32 GB Mac can dedicate ~22 GB to a model. A 12 GB GPU is hard-capped at 12 GB of VRAM. Exceed it and the model spills to system RAM and slows dramatically.

Why are RAM prices so high in 2026?

AI data centers are absorbing DRAM supply. TrendForce reported PC DRAM contract prices rising by over 100% quarter over quarter in early 2026, and Micron’s CEO expects tight supply to persist beyond calendar 2027. Apple raised Mac and iPad prices in June 2026 citing the same shortage. Memory is now the expensive part of a local AI machine.

Should I buy more RAM now or wait for prices to drop?

Buy the tier your target model actually needs, no more. Memory makers do not expect supply relief before late 2027, and Apple unified memory is fixed at purchase. If your target is a 14B model, 16 GB is enough at Q4. Only pay shortage prices for 32-64 GB if you genuinely plan to run 27B-70B class models.

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

By subscribing you agree to our Privacy Policy and to receive the weekly email. Unsubscribe anytime.