LFM2 Models: Liquid AI for Agentic Workflows

Liquid AI's LFM2 takes a different path from the big transformer families. Its 24B-A2B flagship is a hybrid mixture-of-experts model: 24B parameters in total, with only 2B active for each token. The result is large-model knowledge with small-model speed, loading in roughly 14GB on any Mac with 16GB of unified memory. Liquid AI aims it squarely at agent work. It powers LocalCowork, an open-source agent app with 75 MCP tools. If you want a local model that reliably calls tools and follows structured workflows without touching the cloud, LFM2 deserves a look.

Liquid AI2 local models
DEVELOPER
Liquid AI
MODELS
2
SIZE RANGE
8.3B–24B
RAM RANGE
1016 GB
Key Features
Hybrid MoE design: 24B total parameters, only 2B active per token
Loads in about 14GB, fits any 16GB Mac
Built for agent workflows, tool calling, and structured output
Powers LocalCowork, an open-source agent app with 75 MCP tools
Strong CPU throughput thanks to the small active parameter count
Designed for zero-cloud, privacy-sensitive workflows

All LFM2 Models

ModelSizeQuantVRAMMin RAMBest ForQualityOllama
LFM2.5 8B-A1B8.3BQ4_K_M5.5 GB10 GBOn-device agents, tool calling, multilingual chat
84
LFM2 24B-A2B Instruct24BQ4_K_M14 GB16 GBLocal AI agents, privacy-first tool calling, MCP workflows
85

Device Compatibility

Which LFM2 models can run on each device class, based on minimum RAM requirements.

ModeliPhoneAirProStudioMini
LFM2.5 8B-A1B (8.3B)PossiblePossibleExcellentExcellentExcellent
LFM2 24B-A2B Instruct (24B)NoPossiblePossibleExcellentPossible

RAM Requirements

LFM2.5 8B-A1B
5.5 GB · min 10 GB
LFM2 24B-A2B Instruct
14 GB · min 16 GB

Frequently Asked Questions

What is the LFM2 24B-A2B MoE and what Mac can run it?
It is a sparse mixture-of-experts model: 24B total parameters with only 2B active per token. It loads in about 14GB, so it runs on any Mac with 16GB of unified memory: a 16GB MacBook Air, base MacBook Pro, or Mac Mini.
What makes LFM2 different from other models?
LFM2 uses a hybrid architecture instead of a standard dense transformer. The MoE design activates a small slice of the model per token, which keeps inference fast while retaining 24B-scale knowledge. It is tuned for tool calling and agent reliability.
Is LFM2 good for general chat?
It handles chat fine but shines on structured tasks, tool use, and agent workflows. For pure conversational quality at a similar RAM budget, Qwen or Gemma alternatives are the stronger picks.
What Ollama command runs LFM2?
Run `ollama run lfm2:24b-a2b`. You need a 16GB machine at minimum, since the model itself occupies about 14GB once loaded.

Related Model Families

Getting Started