LLM Hardware Requirements Calculator

Enter your RAM or GPU VRAM and see exactly which local AI models you can run, the single best pick, and how fast it will go.

What LLM can I run? A local model at Q4 quantization needs roughly 0.6 GB of memory per billion parameters. ModelFit budgets about 70% of unified memory for the model on machines up to 32GB, scaling to ~85% at 128GB and above, and about 90% of a discrete GPU's VRAM. That means an 8GB device runs models up to ~8B, a 16GB device comfortably runs up to ~14B, 32GB unlocks ~35B-class models, and 64GB or more runs 70B-class models. Use the calculator below to size your exact hardware against 75 local models.

Requirements calculator
GB

With 16 GB of unified memory, ModelFit budgets about 11 GB for the model and comfortably runs local LLMs up to ~12B parameters at Q4. The best single pick is Qwen3.5 9B Instruct.

TOP PICKRuns well

Qwen3.5 9B Instruct

Qwen · top

Best for quality, coding, reasoning. Strong fit for 16 GB RAM with balanced speed and quality.

registry-verified
FOOTPRINT
7 GB
SPEED
~59 tok/s
QUANT
Q4_K_M
PARAMS
9B

Tokens/sec are ModelFit estimates from chip bandwidth and model size, not measured benchmarks. Ollama commands are registry-verified.

ALSO FITS
QWEN
Qwen3 8B
Runs well
SIZE
8B / Q4_K_M
LOAD
6.5 GB
SPEED
~65 tok/s
GEMMA
Gemma 4 12B
Runs well
SIZE
12B / Q4_K_M
LOAD
8 GB
SPEED
~45 tok/s
LLAMA
Llama 3.1 8B Instruct
Runs well
SIZE
8B / Q4_K_M
LOAD
6.5 GB
SPEED
~65 tok/s
GEMMA
Gemma 3 12B Instruct
Runs well
SIZE
12B / Q4_K_M
LOAD
9.5 GB
SPEED
~42 tok/s
MISTRAL
Mistral Nemo 12B
Runs well
SIZE
12B / Q4_K_M
LOAD
9.5 GB
SPEED
~42 tok/s
QWEN
Qwen3.5 4B Instruct
Perfect fit
SIZE
4B / Q4_K_M
LOAD
3.5 GB
SPEED
~122 tok/s
GEMMA
Gemma 2 9B Instruct
Runs well
SIZE
9B / Q4_K_M
LOAD
7 GB
SPEED
~59 tok/s

Local LLM memory requirements by RAM tier

Every row is derived from ModelFit's catalog of 75 local models across 20 families. Click a tier for the full list.

MemoryModel budget (~70-85%)Max model sizeModels that fitTop pick
8 GB~5.6 GB~8.3B params23 / 75LFM2.5 8B-A1B
16 GB~11.2 GB~14B params37 / 75Qwen3.5 9B Instruct (Q8)
24 GB~16.8 GB~27B params45 / 75Gemma 4 12B (Q8)
32 GB~22.4 GB~35B params54 / 75Qwen3.6 35B-A3B
48 GB~34.8 GB~46.7B params59 / 75Qwen3.6 27B (Q8)
64 GB~48 GB~70B params64 / 75Qwen3.6 35B-A3B (Q8)
96 GB~76.8 GB~122B params70 / 75Qwen3.5 122B-A10B Instruct
128 GB~108.8 GB~122B params71 / 75Qwen3.5 122B-A10B Instruct

Q4_K_M assumed. Fit and tok/s are ModelFit estimates from the dataset, not measured benchmarks. Updated 2026-07-02.

Frequently asked questions

How much RAM do I need to run a local LLM?

At Q4 quantization a local LLM needs roughly 0.6 GB of memory per billion parameters, and ModelFit budgets ~70% of unified memory for the model up to 32GB, scaling to ~85% at 128GB and above. In practice 8GB runs models up to ~8B, 16GB comfortably runs up to ~14B, 32GB unlocks ~35B-class models, and 64GB or more runs 70B-class models.

What LLM can I run with my GPU VRAM?

VRAM is the hard ceiling for a discrete GPU, and about 90% of it is usable for model weights. 8GB fits a 7-8B model, 12GB fits 7-9B with more context, 16GB reaches 14B, 24GB runs 32B-class models, and 32GB comfortably runs 32B with long context. Switch the calculator to GPU VRAM mode and enter your card memory to see the exact picks.

How does the calculator work?

Enter your memory amount and pick Apple unified memory or GPU VRAM. The calculator runs ModelFit’s recommendation engine in your browser: it sizes each model at ~0.6 GB per billion parameters, applies the memory budget for your hardware, and ranks the models that fit by quality and speed. Tokens per second are ModelFit estimates from memory bandwidth and model size, not measured benchmarks.

Can I combine two GPUs to run bigger local models?

Yes. Ollama and llama.cpp split model layers across cards automatically, so two or three GPUs pool their VRAM for fit: about 90% of the combined VRAM is usable for weights. Expect real throughput below a single card with the same total VRAM, because inter-GPU transfers add overhead and mixed cards run at the slower card’s pace. Switch the calculator to Multi-GPU rig mode to pick your exact cards and see which models fit, and use the Copy link button to share the setup.

Is the ModelFit calculator free?

Yes. The calculator is completely free, needs no sign-up, and runs entirely in your browser with no data sent to a server. The underlying compatibility dataset is open under CC BY 4.0, and the same engine ships as the free npx @wecko-ai/modelfit command-line tool.

Go deeper

PREFER THE TERMINAL?

The same engine runs offline as a one-line command that detects your machine and names the best local model:

npx @wecko-ai/modelfit