Local LLM Hardware Stats

Name: ModelFit Local LLM Hardware Compatibility Stats
Creator: ModelFit
License: https://creativecommons.org/licenses/by/4.0/

How much memory each model tier needs, and what fits your device. Derived from ModelFit's 75 local models. Updated 2026-07-25.

How much RAM do you need to run a local LLM? An 8GB device runs models up to ~8B, a 16GB device comfortably runs up to ~14B, 32GB unlocks ~31B-class models, and 64GB+ runs 70B-class models. At Q4_K_M a model needs roughly 0.6 GB per billion parameters, and ModelFit budgets ~70% of unified memory for the model on machines up to 32GB, scaling to ~85% at 128GB and above. High-RAM Macs can wire even more memory to the GPU by raising iogpu.wired_limit_mb. ModelFit tracks 75 local models across 21 families.

LOCAL MODELS

MODEL FAMILIES

16GB SWEET SPOT

~14B

64GB+ CEILING

~70B

Model size by memory budget

Memory (RAM / VRAM)	Model budget (~70-85%)	Runs up to (dense)	Models that fit	Strong pick
8 GB	~5.6 GB	~8B params	23 / 75	LFM2.5 8B-A1B
12 GB	~8.4 GB	~12B params	29 / 75	Gemma 4 12B
16 GB	~11.2 GB	~14B params	37 / 75	Qwen3.5 9B Instruct (Q8)
24 GB	~16.8 GB	~27B params	45 / 75	Gemma 4 12B (Q8)
32 GB	~22.4 GB	~31B params	54 / 75	Qwen3.6 35B-A3B
36 GB	~25.4 GB	~31B params	56 / 75	Qwen3.6 35B-A3B
48 GB	~34.8 GB	~31B params	59 / 75	Qwen3.6 27B (Q8)
64 GB	~48 GB	~70B params	64 / 75	Qwen3.6 35B-A3B (Q8)
72 GB	~54.9 GB	~70B params	65 / 75	Qwen3.6 35B-A3B (Q8)
96 GB	~76.8 GB	~70B params	70 / 75	Qwen3.5 122B-A10B Instruct
128 GB	~108.8 GB	~70B params	71 / 75	Qwen3.5 122B-A10B Instruct
192 GB	~163.2 GB	~70B params	72 / 75	Qwen3 235B A22B
256 GB	~217.6 GB	~70B params	72 / 75	Qwen3 235B A22B
512 GB	~435.2 GB	~405B params	75 / 75	Qwen3 235B A22B

Q4_K_M assumed. tok/s and fit are estimates from ModelFit's dataset, not measured benchmarks.

Key facts

ModelFit tracks 107 AI models across 21 families; 75 run locally via Ollama on Apple Silicon or NVIDIA GPUs (ModelFit, 2026).
At Q4_K_M quantization, a local LLM needs roughly 0.6 GB of memory per billion parameters for the weights, rising to about 0.8 GB per billion for models under 8B once runtime overhead is counted (ModelFit, 2026).
ModelFit sizes recommendations to ~70% of a device’s unified memory up to 32GB, scaling to ~85% at 128GB and above, leaving headroom for the OS, context, and KV-cache (ModelFit, 2026).
An 8GB device comfortably runs dense local models up to ~8B parameters at Q4, or up to ~8.3B total as a mixture-of-experts model; 23 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 12GB device comfortably runs dense local models up to ~12B parameters at Q4; 29 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 16GB device comfortably runs dense local models up to ~14B parameters at Q4; 37 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 24GB device comfortably runs dense local models up to ~27B parameters at Q4; 45 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 32GB device comfortably runs dense local models up to ~31B parameters at Q4, or up to ~35B total as a mixture-of-experts model; 54 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 36GB device comfortably runs dense local models up to ~31B parameters at Q4, or up to ~35B total as a mixture-of-experts model; 56 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 48GB device comfortably runs dense local models up to ~31B parameters at Q4, or up to ~46.7B total as a mixture-of-experts model; 59 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 64GB device comfortably runs dense local models up to ~70B parameters at Q4; 64 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 72GB device comfortably runs dense local models up to ~70B parameters at Q4, or up to ~80B total as a mixture-of-experts model; 65 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 96GB device comfortably runs dense local models up to ~70B parameters at Q4, or up to ~122B total as a mixture-of-experts model; 70 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 128GB device comfortably runs dense local models up to ~70B parameters at Q4, or up to ~122B total as a mixture-of-experts model; 71 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 192GB device comfortably runs dense local models up to ~70B parameters at Q4, or up to ~235B total as a mixture-of-experts model; 72 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 256GB device comfortably runs dense local models up to ~70B parameters at Q4, or up to ~235B total as a mixture-of-experts model; 72 of ModelFit’s 75 local models fit (ModelFit, 2026).
A 512GB device comfortably runs dense local models up to ~405B parameters at Q4, or up to ~671B total as a mixture-of-experts model; 75 of ModelFit’s 75 local models fit (ModelFit, 2026).

Frequently asked questions

How much RAM do I need to run a local LLM?

8GB runs models up to ~8B, 16GB comfortably runs up to ~14B, 32GB unlocks ~31B-class models, and 64GB or more runs 70B-class models. ModelFit sizes this at ~70% of unified memory up to 32GB, scaling to ~85% at 128GB+, since a Q4 model needs roughly 0.6 GB per billion parameters.

What size LLM can I run on 16GB of RAM?

On 16GB you can comfortably run local models up to ~14B parameters at Q4; 37 of ModelFit's 75 local models fit, with Qwen3.5 9B Instruct (Q8) a strong pick.

Can 8GB of RAM run a local LLM?

Yes. 8GB runs 23 of ModelFit's local models: small 3-4B models up to ~8B at Q4. Expect tight headroom; close other apps for the best speed.

Cite this page

Free to reuse with attribution (CC BY 4.0). Full machine-readable data: the compatibility dataset and a JSON export.

ModelFit: Local LLM Hardware Stats (2026).
https://modelfit.io/stats/ (accessed 2026-07-25).