LFM2 Models: Liquid AI for Agentic Workflows
Liquid AI's LFM2 takes a different path from the big transformer families. Its 24B-A2B flagship is a hybrid mixture-of-experts model: 24B parameters in total, with only 2B active for each token. The result is large-model knowledge with small-model speed, loading in roughly 14GB on any Mac with 16GB of unified memory. Liquid AI aims it squarely at agent work. It powers LocalCowork, an open-source agent app with 75 MCP tools. If you want a local model that reliably calls tools and follows structured workflows without touching the cloud, LFM2 deserves a look.
Liquid AI2 local models
DEVELOPER
Liquid AI
MODELS
2
SIZE RANGE
8.3B–24B
RAM RANGE
10–16 GB
Key Features
Hybrid MoE design: 24B total parameters, only 2B active per token
Loads in about 14GB, fits any 16GB Mac
Built for agent workflows, tool calling, and structured output
Powers LocalCowork, an open-source agent app with 75 MCP tools
Strong CPU throughput thanks to the small active parameter count
Designed for zero-cloud, privacy-sensitive workflows
All LFM2 Models
| Model | Size | Quant | VRAM | Min RAM | Best For | Quality | Ollama |
|---|---|---|---|---|---|---|---|
| LFM2.5 8B-A1B | 8.3B | Q4_K_M | 5.5 GB | 10 GB | On-device agents, tool calling, multilingual chat | 84 | |
| LFM2 24B-A2B Instruct | 24B | Q4_K_M | 14 GB | 16 GB | Local AI agents, privacy-first tool calling, MCP workflows | 85 |
Device Compatibility
Which LFM2 models can run on each device class, based on minimum RAM requirements.
| Model | iPhone | Air | Pro | Studio | Mini |
|---|---|---|---|---|---|
| LFM2.5 8B-A1B (8.3B) | Possible | Possible | Excellent | Excellent | Excellent |
| LFM2 24B-A2B Instruct (24B) | No | Possible | Possible | Excellent | Possible |
RAM Requirements
5.5 GB · min 10 GB
14 GB · min 16 GB
Frequently Asked Questions
What is the LFM2 24B-A2B MoE and what Mac can run it?
It is a sparse mixture-of-experts model: 24B total parameters with only 2B active per token. It loads in about 14GB, so it runs on any Mac with 16GB of unified memory: a 16GB MacBook Air, base MacBook Pro, or Mac Mini.
What makes LFM2 different from other models?
LFM2 uses a hybrid architecture instead of a standard dense transformer. The MoE design activates a small slice of the model per token, which keeps inference fast while retaining 24B-scale knowledge. It is tuned for tool calling and agent reliability.
Is LFM2 good for general chat?
It handles chat fine but shines on structured tasks, tool use, and agent workflows. For pure conversational quality at a similar RAM budget, Qwen or Gemma alternatives are the stronger picks.
What Ollama command runs LFM2?
Run `ollama run lfm2:24b-a2b`. You need a 16GB machine at minimum, since the model itself occupies about 14GB once loaded.