Best Local AI Models for AMD Ryzen AI Max+ 395 (Strix Halo)

The Ryzen AI Max+ 395 is AMD's answer to Apple Silicon: a single APU with 128GB of unified memory, of which roughly 110GB is GPU-addressable on Linux. That capacity lets a sub-$2,000 mini PC load 70B and even 120B-class models that no consumer GPU can hold. The trade-off is bandwidth: at 256 GB/s, dense large models load but generate slowly, so MoE models in the 30B-120B range are the sweet spot.

110GB unified

Quick answer

The best local LLM for the Ryzen AI Max+ 395 is GPT-OSS 120B at ~11 tok/s on its 110GB unified memory. It uses ~65.4GB of unified memory; the Ryzen AI Max+ 395 handles up to 120B parameter models at Q4.

$ollama run gpt-oss:120b

TOP PICK

GPT-OSS 120B

EST. SPEED

~11 tok/s

MEMORY NEEDED

~65.4 GB

Speeds are ModelFit estimates from memory bandwidth and model size, not measured benchmarks.

Unified Memory110 GB unified LPDDR5x

Speed (8B Q4)30 tok/s

Bandwidth256 GB/s

ArchitectureZen 5 + RDNA 3.5

Price$1,999*

Max model sizeUp to 120B parameter models

Compatibility10 excellent, 0 workable

*128GB GMKtec EVO-X2

Ryzen AI Max+ 395 Estimated Tokens/sec by Model Size

Q4_K_M · ModelFit estimate

Model Size	Est. Speed	Fit on 110GB
7B	~34 tok/s	Fits in unified memory
14B	~19 tok/s	Fits in unified memory
20B MoE (3.6B active)	~29 tok/s	Fits in unified memory
32B	~9 tok/s	Fits in unified memory
35B MoE (3B active)	~24 tok/s	Fits in unified memory
70B	~5 tok/s	Fits in unified memory
120B MoE (5.1B active)	~11 tok/s	Fits in unified memory

ModelFit estimates, not measured benchmarks: anchored to an 8B-class Q4_K_M model at 16K context on the Ryzen AI Max+ 395's 256 GB/s bandwidth, then scaled by model size. MoE rows scale by active parameters (decode reads only the active experts), so a 35B MoE runs far faster than a dense 32B. "CPU offload" sizes exceed the 110GB unified memory; dense models slow to a crawl there, MoE models degrade less because hot experts stay GPU-resident.

Context costs unified memory too. GPT-OSS 120B loads ~65.4 GB of weights; at 16k context the KV cache adds ~6.0 GB (still fits the ~99 GB usable unified memory), and at 64k it adds ~24.0 GB (still fits).

KV-cache figures assume an fp16 cache, the llama.cpp/Ollama default. Standard GQA models use a size-class estimate (8 KV heads x 128 head dim class); hybrid linear-attention models (Qwen3.5/3.6, Qwen3-Next) use the exact per-token cost from their published config, since only their sparse full-attention layers cache KV. A q8_0 KV cache roughly halves either figure. Estimates, not measurements.

Where to Buy the Ryzen AI Max+ 395

≈ $1,999 street · 128GB GMKtec EVO-X2

Check price on Amazon

Storage & accessories for your model library

Internal NVMe SSD · 2TB~$170

A Gen4 M.2 drive keeps your whole GGUF and quant collection on fast local storage, loading models straight off NVMe.

Check price on Amazon

USB4 NVMe Enclosure~$80

40Gbps external storage fast enough to run models from. Pair it with an M.2 drive for a portable model vault.

Check price on Amazon

ModelFit may earn a commission on purchases through these links, at no extra cost to you. Prices shown are approximate street references.

Ryzen AI Max+ 395 Unified Memory for AI: What Actually Fits?

Unlike a discrete GPU's fixed VRAM, Strix Halo shares one 128GB LPDDR5x pool between CPU and GPU. On Linux, kernel GTT tuning exposes about 110GB of that to the Radeon 8060S iGPU, more than triple an RTX 5090's 32GB. A 70B model at Q4 (~42GB) or a 120B MoE (~65GB) fits with headroom. The catch is memory bandwidth: 256 GB/s (about 215 GB/s measured) is a fraction of a discrete GPU's, and since token generation is bandwidth-bound, dense 70B models run around 5 tok/s. Mixture-of-Experts models, which activate only a few billion parameters per token, are where this chip shines, hitting 50-70+ tok/s. On Windows the GPU is capped at a fixed BIOS allocation with no equivalent shared pool, so the big-model capability is mainly a Linux story today.

Ryzen AI Max+ 395 vs Top GPUs

Hardware	Memory	Speed	Bandwidth	Price
Ryzen AI Max+ 395	110 GB	30 tok/s	256 GB/s	$1,999
RTX 5090	32 GB	145 tok/s	1792 GB/s	$2,499
RTX 4090	24 GB	104 tok/s	1008 GB/s	$2,574

Fits in 110 GB unified memory with room to spare. Best for reasoning, coding, agent scenarios on Ryzen AI Max+ 395.

ollamaregistry-verified

Models Too Big for 110GB? Rent a Cloud GPU

by the hour

The Ryzen AI Max+ 395 tops out around up to 120b parameter models. For anything bigger, an hourly rented GPU runs the same open weights with the same Ollama workflow, billed by the hour, no hardware purchase needed.

RunPodHourly GPU pods (RTX 4090 to H100) with one-click Ollama/vLLM templates.Rent

Vast.aiMarketplace of rented GPUs, usually the cheapest per-hour prices.Rent

ModelFit may earn a commission on sign-ups made through these links, at no extra cost to you.

Compare With Top GPUs

RTX 5090 (32GB · 145 tok/s)RTX 4090 (24GB · 104 tok/s)

Compatible Model Families

Qwen

Alibaba Cloud: Widest size range (0.5B to 235B)

Llama

Meta: Most popular open-weight model family

DeepSeek

DeepSeek AI: Best-in-class reasoning with R1 models

Mistral

Mistral AI: Excellent performance-per-parameter ratio

Gemma

Google DeepMind: Excellent quality at small sizes (1B-9B)

Phi

Microsoft: Best quality-per-gigabyte at small sizes

Ryzen AI Max+ 395 FAQ: Common Questions

What size LLM can the Ryzen AI Max+ 395 run?

Up to 120B-parameter models. Its 128GB unified memory (~110GB GPU-addressable on Linux) holds a 70B model at Q4 (~42GB) or a 120B MoE (~65GB) with room to spare, far beyond any consumer GPU. Mixture-of-Experts models in the 30B-120B range run best.

How fast is the Ryzen AI Max+ 395 for local AI?

It depends on the model type. Dense 70B models generate around 5 tok/s because the 256 GB/s memory bandwidth is the bottleneck. MoE models like Qwen3 30B-A3B or gpt-oss-120b run much faster, 50-70+ tok/s, since only a few billion parameters are active per token. All figures are estimates.

Ryzen AI Max+ 395 vs RTX 5090 for local LLMs?

They win on different axes. The RTX 5090 (32GB, 1,792 GB/s) is far faster per token for models that fit in 32GB. The Ryzen AI Max+ 395 (110GB usable, 256 GB/s) is slower but holds models 3x larger. AMD claims up to 3x the 5090-class performance only when a model exceeds the Nvidia card's VRAM and spills to system RAM.

Do I need Linux to run large models on Strix Halo?

For the largest models, effectively yes. On Linux, GTT kernel tuning lets the GPU address roughly 110GB of the 128GB pool. On Windows the GPU is limited to a fixed BIOS memory carve-out with no equivalent shared pool, so the very-large-model capability is mainly a Linux feature today.

How much does a 128GB Strix Halo mini PC cost?

The 128GB GMKtec EVO-X2 launched around $1,999, with street prices roughly $1,800-$2,300. AMD's own first-party dev kit is reported at $3,999. The cheaper $1,499 EVO-X2 is the 64GB version, which cannot hold a 235B model.

How fast is a 27B-class model like Qwen3.5 27B Instruct on the Ryzen AI Max+ 395?

ModelFit estimates a 32B model on the Ryzen AI Max+ 395 runs at roughly 9 tok/s at Q4_K_M. The current 27B-class pick in the catalog is Qwen3.5 27B Instruct (ollama run qwen3.5:27b).

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

By subscribing you agree to our Privacy Policy and to receive the weekly email. Unsubscribe anytime.

Related Guides & Benchmarks

AMD Strix Halo: 128GB Local AI Under $2,000

Full breakdown of the Ryzen AI Max+ 395 for local LLMs vs Apple Silicon.

Local LLMs vs GPT-4 and Claude

How large local models compare to cloud API flagships.

DeepSeek-V3 vs Qwen 3.5

Compare the top model families you can run on 110GB.

Sizing Local AI? Start With RAM & VRAM

How Much RAM (or VRAM) Do You Need for a Local LLM?

The model-size-to-memory matrix: what each VRAM and RAM tier actually runs.

Best LLM for MacBook (Apple Silicon)

Unified-memory Macs run bigger models per dollar. Picks by RAM tier, M1 to M5.

Browse All NVIDIA GPUs for AI

RX 7900 XTX RX 7900 XT

Want Personalized Recommendations?

Use our interactive wizard to compare models across Apple Silicon and NVIDIA GPUs.

Open ModelFit Wizard View Benchmark Tool

Best Local AI Models for AMD Ryzen AI Max+ 395 (Strix Halo)

Ryzen AI Max+ 395 Estimated Tokens/sec by Model Size

Where to Buy the Ryzen AI Max+ 395

Ryzen AI Max+ 395 Unified Memory for AI: What Actually Fits?

Ryzen AI Max+ 395 vs Top GPUs

Recommended Models

GPT-OSS 120B

Qwen3-Next 80B-A3B

Qwen3.5 122B-A10B Instruct

Llama 4 Scout

Qwen3.6 35B-A3B (Q8)

Qwen3.5 35B-A3B Instruct (Q8)

Qwen3.6 27B (Q8)

Qwen3-Next 80B-A3B (Q8)

Qwen3.6 35B-A3B

Qwen3.5 35B-A3B Instruct

Models Too Big for 110GB? Rent a Cloud GPU

Compare With Top GPUs

Compatible Model Families

Ryzen AI Max+ 395 FAQ: Common Questions

The weekly local-AI refresh

Related Guides & Benchmarks

Sizing Local AI? Start With RAM & VRAM

Browse All NVIDIA GPUs for AI

Want Personalized Recommendations?