Question 1

How much RAM do I need to run a local LLM?

Accepted Answer

At Q4 quantization a local LLM needs roughly 0.6 GB of memory per billion parameters, and ModelFit budgets ~70% of unified memory for the model up to 32GB, scaling to ~85% at 128GB and above. In practice 8GB runs models up to ~8B, 16GB comfortably runs up to ~14B, 32GB unlocks ~35B-class models, and 64GB or more runs 70B-class models.

Question 2

What LLM can I run with my GPU VRAM?

Accepted Answer

VRAM is the hard ceiling for a discrete GPU, and about 90% of it is usable for model weights. 8GB fits a 7-8B model, 12GB fits 7-9B with more context, 16GB reaches 14B, 24GB runs 32B-class models, and 32GB comfortably runs 32B with long context. Switch the calculator to GPU VRAM mode and enter your card memory to see the exact picks.

Question 3

How does the calculator work?

Accepted Answer

Enter your memory amount and pick Apple unified memory or GPU VRAM. The calculator runs ModelFit’s recommendation engine in your browser: it sizes each model at ~0.6 GB per billion parameters, applies the memory budget for your hardware, and ranks the models that fit by quality and speed. Tokens per second are ModelFit estimates from memory bandwidth and model size, not measured benchmarks.

Question 4

Can I combine two GPUs to run bigger local models?

Accepted Answer

Yes. Ollama and llama.cpp split model layers across cards automatically, so two or three GPUs pool their VRAM for fit: about 90% of the combined VRAM is usable for weights. Expect real throughput below a single card with the same total VRAM, because inter-GPU transfers add overhead and mixed cards run at the slower card’s pace. Switch the calculator to Multi-GPU rig mode to pick your exact cards and see which models fit, and use the Copy link button to share the setup.

Question 5

Is the ModelFit calculator free?

Accepted Answer

Yes. The calculator is completely free, needs no sign-up, and runs entirely in your browser with no data sent to a server. The underlying compatibility dataset is open under CC BY 4.0, and the same engine ships as the free npx @wecko-ai/modelfit command-line tool.

Memory	Model budget (~70-85%)	Max model size	Models that fit	Top pick
8 GB	~5.6 GB	~8.3B params	23 / 75	LFM2.5 8B-A1B
16 GB	~11.2 GB	~14B params	37 / 75	Qwen3.5 9B Instruct (Q8)
24 GB	~16.8 GB	~27B params	45 / 75	Gemma 4 12B (Q8)
32 GB	~22.4 GB	~35B params	54 / 75	Qwen3.6 35B-A3B
48 GB	~34.8 GB	~46.7B params	59 / 75	Qwen3.6 27B (Q8)
64 GB	~48 GB	~70B params	64 / 75	Qwen3.6 35B-A3B (Q8)
96 GB	~76.8 GB	~122B params	70 / 75	Qwen3.5 122B-A10B Instruct
128 GB	~108.8 GB	~122B params	71 / 75	Qwen3.5 122B-A10B Instruct

LLM Hardware Requirements Calculator

Qwen3.5 9B Instruct

Local LLM memory requirements by RAM tier

Frequently asked questions

How much RAM do I need to run a local LLM?

What LLM can I run with my GPU VRAM?

How does the calculator work?

Can I combine two GPUs to run bigger local models?

Is the ModelFit calculator free?

Go deeper