By ModelFit Team · 2026-06-05

LFM2.5 on Mac: Run Liquid AI's Fastest Local Model with Ollama (2026)

Liquid AI shipped LFM2.5 in early 2026 — and the 8B-A1B variant is the sweet spot for Apple Silicon. It packs 8.3 billion total parameters but activates only 1.5 billion per token. That means MacBook Air, MacBook Pro, and Mac Mini owners can run a reasoning-tuned, tool-calling model locally without upgrading RAM.
TL;DR: LFM2.5-8B-A1B (8.3B total / 1.5B active) runs in ~5 GB on Q4_K_M quantization via Ollama. It scores 91.84 on IFEval, 88.76 on MATH500, and 64.36 on BFCLv3 — all with native tool-calling support. Any M-series Mac with 8 GB+ unified memory can run it. Command: ollama run lfm2.5:8b-a1b-q4_K_M.

What Is LFM2.5-8B-A1B?

LFM2.5 is Liquid AI's second-generation hybrid architecture. The 8B-A1B model is built for on-device personal assistants, agentic workflows, and multilingual chat.

The architecture is not a standard Transformer. It mixes 18 double-gated LIV convolution layers with 6 Grouped Query Attention (GQA) layers. The result is a model that processes tokens faster than dense Transformers at the same parameter scale while keeping a 128,000-token context window (HuggingFace model card, June 2026).

SpecLFM2.5-8B-A1B
Total parameters8.3B
Active per token1.5B
Layers24 (18 conv + 6 GQA)
Training tokens38 trillion
Context length128,000
Vocabulary128,000
LanguagesEnglish, Arabic, Chinese, French, German, Italian, Japanese, Korean, Portuguese, Spanish
Supported frameworksllama.cpp, MLX, vLLM, SGLang, Ollama

Day-one Ollama support means you do not need to download GGUF files manually or write a Modelfile. The model is available directly from the Ollama registry.

Benchmarks: How Good Is It?

Liquid AI published head-to-head results against same-size and larger models. These numbers come from the official HuggingFace model card, confirmed by raw source fetch.

Improvements over LFM2-8B-A1B

BenchmarkLFM2-8B-A1BLFM2.5-8B-A1BChange
AA-Omniscience Non-Hallucination Rate7.4663.47+56.01
IFEval79.4491.84+12.40
IFBench26.0056.47+30.47
Multi-IF58.5479.93+21.39
MATH50074.8088.76+13.96
AIME2520.0042.53+22.53
BFCLv345.0764.36+19.29
Tau² Telecom13.6088.07+74.47

The jump in non-hallucination rate (+56 points) and instruction-following benchmarks makes LFM2.5 a clear upgrade over its predecessor.

Versus Same-Size Competitors

ModelParamsIFEvalMATH500AIME25BFCLv3
LFM2.5-8B-A1B8B/1.5B91.8488.7642.5364.36
Granite-4.0-H-Tiny7B/1B82.2359.204.9356.89
Qwen3.5-4B4B87.8080.7654.2871.06
Gemma-4-E4B-IT8B87.7465.0034.3357.31

LFM2.5 leads on instruction following (IFEval 91.84) and math (MATH500 88.76). Qwen3.5-4B edges ahead on some tool-use and reasoning benchmarks, but LFM2.5's native tool-calling format and Ollama availability make it simpler to deploy.

How It Runs on Apple Silicon

LFM2.5-8B-A1B is available in three formats relevant to Mac users:

1. GGUF (via Ollama or LM Studio) — cross-platform, runs on CPU and GPU

2. MLX — Apple Silicon-native, optimized for Metal GPU inference

3. ONNX — cross-platform runtime

The Ollama tag lfm2.5:8b-a1b-q4_K_M resolves and downloads a ~5.1 GB model. On a MacBook Air M3 with 16 GB unified memory, this leaves plenty of headroom for macOS and other apps.

Which Mac Can Run It?

DeviceRAMCan Run?Notes
MacBook Air M1 8 GB8 GB✅ Yes~5 GB model, tight but workable
MacBook Air M2/M3 16 GB16 GB✅ YesComfortable headroom
MacBook Pro M3/M4 18 GB18 GB✅ YesGood balance
MacBook Pro M4 Pro 24 GB24 GB✅ YesExcellent
Mac Mini M4 16/24 GB16-24 GB✅ YesFits easily
Mac Studio M2/M4 Ultra64-128 GB✅ YesPlenty of room

Unlike the 24B-A2B variant which needs 14.5 GB and excludes 8 GB Macs, the 8B-A1B runs on virtually every Apple Silicon machine sold since 2020.

One-Command Setup with Ollama

If Ollama is already installed, the entire setup is one line:

ollama run lfm2.5:8b-a1b-q4_K_M

The model will download automatically (~5.1 GB) and start an interactive chat session.

Recommended generation parameters (from Liquid AI):
ollama run lfm2.5:8b-a1b-q4_K_M --temperature 0.2 --top-k 80

For persistent settings, create a custom Modelfile:

cat > Modelfile << 'EOF'

FROM lfm2.5:8b-a1b-q4_K_M

PARAMETER temperature 0.2

PARAMETER top_k 80

PARAMETER repeat_penalty 1.05

SYSTEM "You are a helpful assistant trained by Liquid AI."

EOF

ollama create my-lfm2.5 -f Modelfile

ollama run my-lfm2.5

Tool Calling and Agent Workflows

LFM2.5 supports native function calling. The model outputs Pythonic function calls between special <|tool_call_start|> tokens. This makes it ideal for agentic workflows without needing a separate JSON parser layer.

The YouTube benchmark by Prompt Engineer (June 2026) tested LFM2.5-8B-A1B for tool calling, JSON structured output, multilingual responses, and reasoning. The model handled native tool calls correctly and generated structured JSON on demand. Inference speed on an RTX 4060 laptop reached ~76 tok/s in single-stream mode with the Q4_K_M GGUF.

On Apple Silicon, MLX-native inference will typically outperform GGUF/llama.cpp for this model size. You can run the MLX variant directly via the HuggingFace mlx-lm package:

pip install mlx-lm

python -m mlx_lm.server --model LiquidAI/LFM2.5-8B-A1B-MLX-8bit

When to Choose LFM2.5-8B-A1B

Choose this model if:

  • You want a local AI assistant with tool-calling on a MacBook Air
  • You need multilingual support (10 languages out of the box)
  • You care about low RAM usage (5 GB vs 14+ GB for larger agents)
  • You want Ollama one-command deployment

Look elsewhere if:

  • You need heavy programming (Liquid AI notes this is not its strongest area)
  • You need vision/multimodal (LFM2.5 is text-only)
  • You need maximum context at 262K+ (Qwen models offer longer context)

FAQ

What RAM do I need for LFM2.5-8B-A1B?

The Q4_K_M quantization uses approximately 5 GB of RAM. Any Mac with 8 GB unified memory can run it. For comfortable multitasking, 16 GB is recommended.

Does LFM2.5 support vision or images?

No. LFM2.5-8B-A1B is a text-only model. It does not have a vision encoder. If you need multimodal capabilities, consider Qwen3.5-4B or Gemma-4-IT models.

How does it compare to Qwen3.5-4B?

LFM2.5-8B-A1B leads on IFEval (91.84 vs 87.80) and MATH500 (88.76 vs 80.76). Qwen3.5-4B scores higher on AIME25 (54.28 vs 42.53) and BFCLv3 (71.06 vs 64.36). LFM2.5's advantage is native tool-calling format and broader framework support on day one.

Can I use it with Claude Code or other agent frameworks?

Yes. Any tool that supports Ollama as a backend can use LFM2.5. The model's native tool-calling output works with MCP servers and agent loops that expect function-call formatting.

Is the Ollama tag verified?

Yes. The tag lfm2.5:8b-a1b-q4_K_M resolves with HTTP 200 on the Ollama registry and downloads a verified 5.1 GB model manifest.

---

Published June 5, 2026. LFM2.5-8B-A1B is available now on Ollama, HuggingFace, and MLX. Resources:
What hardware runs this?

Match this model to a machine that can run it — by RAM tier for Apple Silicon, or by VRAM for an NVIDIA GPU.

See how this changes your recommendation
Run the wizard

The weekly local-AI refresh

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

Have questions? Reach out on X/Twitter