LFM2.5 on Mac: Run Liquid AI's Fastest Local Model with Ollama (2026)

Q: Is the Ollama tag verified?

Yes. The tag lfm2.5:8b-a1b-q4_K_M resolves with HTTP 200 on the Ollama registry and downloads a verified 5.1 GB model manifest.

Liquid AI shipped LFM2.5 in early 2026, and the 8B-A1B variant is the sweet spot for Apple Silicon. It packs 8.3 billion total parameters but activates only 1.5 billion per token. That means MacBook Air, MacBook Pro, and Mac Mini owners can run a reasoning-tuned, tool-calling model locally without upgrading RAM.

TL;DR: LFM2.5-8B-A1B (8.3B total / 1.5B active) runs in ~5 GB on Q4_K_M quantization via Ollama. It scores 91.84 on IFEval, 88.76 on MATH500, and 64.36 on BFCLv3, all with native tool-calling support. Any M-series Mac with 8 GB+ unified memory can run it. Command: ollama run lfm2.5:8b-a1b-q4_K_M.

What Is LFM2.5-8B-A1B?

LFM2.5 is Liquid AI's second-generation hybrid architecture. The 8B-A1B model is built for on-device personal assistants, agentic workflows, and multilingual chat.

The architecture is not a standard Transformer. It mixes 18 double-gated LIV convolution layers with 6 Grouped Query Attention (GQA) layers. The result is a model that processes tokens faster than dense Transformers at the same parameter scale while keeping a 128,000-token context window (HuggingFace model card, June 2026).

Spec	LFM2.5-8B-A1B
Total parameters	8.3B
Active per token	1.5B
Layers	24 (18 conv + 6 GQA)
Training tokens	38 trillion
Context length	128,000
Vocabulary	128,000
Languages	English, Arabic, Chinese, French, German, Italian, Japanese, Korean, Portuguese, Spanish
Supported frameworks	llama.cpp, MLX, vLLM, SGLang, Ollama

Day-one Ollama support means you do not need to download GGUF files manually or write a Modelfile. The model is available directly from the Ollama registry.

Benchmarks: How Good Is It?

Liquid AI published head-to-head results against same-size and larger models. These numbers come from the official HuggingFace model card, confirmed by raw source fetch.

Improvements over LFM2-8B-A1B

Benchmark	LFM2-8B-A1B	LFM2.5-8B-A1B	Change
AA-Omniscience Non-Hallucination Rate	7.46	63.47	+56.01
IFEval	79.44	91.84	+12.40
IFBench	26.00	56.47	+30.47
Multi-IF	58.54	79.93	+21.39
MATH500	74.80	88.76	+13.96
AIME25	20.00	42.53	+22.53
BFCLv3	45.07	64.36	+19.29
Tau² Telecom	13.60	88.07	+74.47

The jump in non-hallucination rate (+56 points) and instruction-following benchmarks makes LFM2.5 a clear upgrade over its predecessor.

Versus Same-Size Competitors

Model	Params	IFEval	MATH500	AIME25	BFCLv3
LFM2.5-8B-A1B	8B/1.5B	91.84	88.76	42.53	64.36
Granite-4.0-H-Tiny	7B/1B	82.23	59.20	4.93	56.89
Qwen3.5-4B	4B	87.80	80.76	54.28	71.06
Gemma-4-E4B-IT	8B	87.74	65.00	34.33	57.31

LFM2.5 leads on instruction following (IFEval 91.84) and math (MATH500 88.76). Qwen3.5-4B edges ahead on some tool-use and reasoning benchmarks, but LFM2.5's native tool-calling format and Ollama availability make it simpler to deploy.

How It Runs on Apple Silicon

LFM2.5-8B-A1B is available in three formats relevant to Mac users:

1. GGUF (via Ollama or LM Studio): cross-platform, runs on CPU and GPU

2. MLX: Apple Silicon-native, optimized for Metal GPU inference

3. ONNX: cross-platform runtime

The Ollama tag lfm2.5:8b-a1b-q4_K_M resolves and downloads a ~5.1 GB model. On a MacBook Air M3 with 16 GB unified memory, this leaves plenty of headroom for macOS and other apps.

Which Mac Can Run It?

Device	RAM	Can Run?	Notes
MacBook Air M1 8 GB	8 GB	✅ Yes	~5 GB model, tight but workable
MacBook Air M2/M3 16 GB	16 GB	✅ Yes	Comfortable headroom
MacBook Pro M3/M4 18 GB	18 GB	✅ Yes	Good balance
MacBook Pro M4 Pro 24 GB	24 GB	✅ Yes	Excellent
Mac Mini M4 16/24 GB	16-24 GB	✅ Yes	Fits easily
Mac Studio M2/M3 Ultra	64-512 GB	✅ Yes	Plenty of room

Unlike the 24B-A2B variant which needs 14.5 GB and excludes 8 GB Macs, the 8B-A1B runs on virtually every Apple Silicon machine sold since 2020.

One-Command Setup with Ollama

If Ollama is already installed, the entire setup is one line:

ollama run lfm2.5:8b-a1b-q4_K_M

The model will download automatically (~5.1 GB) and start an interactive chat session.

Recommended generation parameters (from Liquid AI):

ollama run lfm2.5:8b-a1b-q4_K_M --temperature 0.2 --top-k 80

For persistent settings, create a custom Modelfile:

cat > Modelfile << 'EOF' FROM lfm2.5:8b-a1b-q4_K_M PARAMETER temperature 0.2 PARAMETER top_k 80 PARAMETER repeat_penalty 1.05 SYSTEM "You are a helpful assistant trained by Liquid AI." EOF ollama create my-lfm2.5 -f Modelfile

ollama run my-lfm2.5

Tool Calling and Agent Workflows

LFM2.5 supports native function calling. The model outputs Pythonic function calls between special <|tool_call_start|> tokens. This makes it ideal for agentic workflows without needing a separate JSON parser layer.

The YouTube benchmark by Prompt Engineer (June 2026) tested LFM2.5-8B-A1B for tool calling, JSON structured output, multilingual responses, and reasoning. The model handled native tool calls correctly and generated structured JSON on demand. Inference speed on an RTX 4060 laptop reached ~76 tok/s in single-stream mode with the Q4_K_M GGUF.

On Apple Silicon, MLX-native inference will typically outperform GGUF/llama.cpp for this model size. You can run the MLX variant directly via the HuggingFace mlx-lm package:

pip install mlx-lm

python -m mlx_lm.server --model LiquidAI/LFM2.5-8B-A1B-MLX-8bit

When to Choose LFM2.5-8B-A1B

✅ Choose this model if:

You want a local AI assistant with tool-calling on a MacBook Air
You need multilingual support (10 languages out of the box)
You care about low RAM usage (5 GB vs 14+ GB for larger agents)
You want Ollama one-command deployment

❌ Look elsewhere if:

You need heavy programming (Liquid AI notes this is not its strongest area)
You need vision/multimodal (LFM2.5 is text-only)
You need maximum context at 262K+ (Qwen models offer longer context)

FAQ

What RAM do I need for LFM2.5-8B-A1B?

The Q4_K_M quantization uses approximately 5 GB of RAM. Any Mac with 8 GB unified memory can run it. For comfortable multitasking, 16 GB is recommended.

Does LFM2.5 support vision or images?

No. LFM2.5-8B-A1B is a text-only model. It does not have a vision encoder. If you need multimodal capabilities, consider Qwen3.5-4B or Gemma-4-IT models.

How does it compare to Qwen3.5-4B?

LFM2.5-8B-A1B leads on IFEval (91.84 vs 87.80) and MATH500 (88.76 vs 80.76). Qwen3.5-4B scores higher on AIME25 (54.28 vs 42.53) and BFCLv3 (71.06 vs 64.36). LFM2.5's advantage is native tool-calling format and broader framework support on day one.

Can I use it with Claude Code or other agent frameworks?

Yes. Any tool that supports Ollama as a backend can use LFM2.5. The model's native tool-calling output works with MCP servers and agent loops that expect function-call formatting.

Is the Ollama tag verified?

Yes. The tag lfm2.5:8b-a1b-q4_K_M resolves with HTTP 200 on the Ollama registry and downloads a verified 5.1 GB model manifest.

---

Published June 5, 2026. LFM2.5-8B-A1B is available now on Ollama, HuggingFace, and MLX. Resources: