2026-03-09
LFM2-24B-A2B + LocalCowork: Run a Full AI Agent Locally on Your Mac (2026)
LFM2 scales from 350M to 24B with log-linear quality gains — every size stays fast
TL;DR: LFM2-24B-A2B (24B total / 2B active params) powers LocalCowork, a fully offline MCP agent with 75 tools. Runs at 14.5 GB RAM on Q4_K_M GGUF, 112 tok/s on AMD CPU, 293 tok/s on H100. Single-step tool accuracy hits 80%+ on the curated 20-tool set. No internet, no API key, no data egress. Requires 16+ GB unified memory on Apple Silicon.
What Is LFM2-24B-A2B?
LFM2 is Liquid AI's family of hybrid foundation models, built for on-device inference. The 24B-A2B variant is the flagship — a Sparse Mixture-of-Experts (MoE) model that activates only 2 billion of its 24 billion parameters per token.
The underlying architecture is not a standard Transformer. Liquid AI combines Gated Delta Networks (a linear attention variant) with sparse MoE routing. This hybrid approach dramatically reduces per-token compute while keeping the model's full knowledge base intact (HuggingFace model card, March 2026).
| Spec | LFM2-24B-A2B | LFM2-8B-A1B |
|---|---|---|
| Total parameters | 24B | 8.3B |
| Active per token | ~2B | ~1.5B |
| RAM (Q4_K_M) | ~14.5 GB | ~6 GB |
| Decode speed (AMD CPU) | 112 tok/s | ~180 tok/s |
| Decode speed (H100) | 293 tok/s | ~400 tok/s |
| Serving support | llama.cpp, vLLM, SGLang | llama.cpp, vLLM, SGLang |
Day-one support for llama.cpp, vLLM, and SGLang means you can run it right now with Ollama or LM Studio — no waiting for framework updates.
What Is LocalCowork?
LocalCowork is an open-source desktop AI agent released alongside LFM2-24B-A2B on March 5, 2026. It runs fully offline using the Model Context Protocol (MCP) — the same protocol Claude and other agents use to connect to external tools, but here everything stays on your machine.
The system ships with 75 tools across 14 MCP servers. The production-ready demo focuses on a curated subset of 20 tools across 6 servers, each tested to achieve over 80% single-step accuracy (Liquid AI blog, March 2026).
What it can do out of the box:- Security scanning — finds leaked API keys, AWS credentials, and personal data buried in your folders
- Audit trails — logs every tool call to a local file, generates compliance-ready reports
- Document processing — OCR, contract diffing, PDF generation
- File operations — list, read, search, write across the filesystem
- System info — disk usage, clipboard read/write, process inspection
- Cross-server chaining — clipboard output from one server feeds into another
The code is open-source and available at github.com/Liquid4All/cookbook. You can add your own MCP servers or swap the model.
How Does It Perform on Apple Silicon?
Liquid AI tested LocalCowork on an Apple M4 Max with 36 GB unified memory. On M4 Pro with the Cactus inference engine (INT8 quantization), the model achieves 229 tokens/sec prefill and 27 tokens/sec decode (Liquid AI launch blog, February 2026). That is fast enough to feel interactive for agent tasks.
For single-step tool dispatch — the most critical metric for an agent — LFM2-24B-A2B hits 80%+ accuracy on the curated 20-tool set. Multi-step chains (3-6 steps) complete end-to-end 26% of the time. Liquid AI is clear about this: the model is best used as a fast, deterministic dispatcher in a guided loop, not a hands-off autopilot for long autonomous chains.
Real-world comparison: The smaller LFM2-350M decodes at 255.7 tok/s vs Qwen3.5-0.8B at 83.4 tok/s — a 3.1× speed advantage for the LFM architecture at the same parameter scale (r/LocalLLaMA, March 2026).
Which Mac Can Run LFM2-24B-A2B?
At 14.5 GB in Q4_K_M, you need at least 16 GB unified memory. The 8 GB MacBook Air cannot run this model. Here's the full breakdown.
| Device | RAM | Can Run LFM2-24B-A2B? | Speed (est.) |
|---|---|---|---|
| MacBook Air M1/M2/M3 8GB | 8 GB | ❌ No | — |
| MacBook Air M3 16GB | 16 GB | ⚠️ Tight (system overhead) | ~20 tok/s |
| MacBook Pro M4 Pro 24GB | 24 GB | ✅ Yes | ~27 tok/s decode |
| MacBook Pro M4 Max 36GB | 36 GB | ✅ Yes | ~30+ tok/s |
| Mac Mini M4 Pro 24GB | 24 GB | ✅ Yes | ~25 tok/s decode |
| Mac Studio M4 Max 64GB | 64 GB | ✅ Yes (with headroom) | ~35+ tok/s |
For MacBook Air 16 GB users, the system overhead (macOS + apps) will leave you under 16 GB free RAM. The 8B-A1B variant is a better fit at ~6 GB and delivers similar tool-calling accuracy for simpler workflows.
How to Set It Up with Ollama
LocalCowork requires llama-server (from llama.cpp) rather than Ollama directly, but the model downloads from HuggingFace with standard tools.
Step 1: Install llama.cpp (for Apple Silicon)brew install llama.cpp
Step 2: Download the GGUF
huggingface-cli download LiquidAI/LFM2-24B-A2B-GGUF \
--include "LFM2-24B-A2B-Q4_K_M.gguf" \
--local-dir ./lfm2-24b
Step 3: Start the server
llama-server \
-m ./lfm2-24b/LFM2-24B-A2B-Q4_K_M.gguf \
--flash-attn \
-ngl 99 \
--port 8080
Step 4: Clone and run LocalCowork
git clone https://github.com/Liquid4All/cookbook
cd cookbook/examples/localcowork
npm install && npm start
The app connects to llama-server on localhost and routes all tool calls through MCP — no internet connection required at any step.
LFM2 vs Qwen3.5 for Agent Tasks
Both model families target local deployment on consumer hardware. Here is how they compare for agent-specific use cases.
| Metric | LFM2-24B-A2B | Qwen3.5-9B | Qwen3.5-27B |
|---|---|---|---|
| RAM (Q4) | 14.5 GB | ~7 GB | ~18 GB |
| Decode (consumer CPU) | 112 tok/s | ~60 tok/s | ~30 tok/s |
| Single-step tool accuracy | 80%+ (curated) | — | — |
| MCP integration | ✅ Native | Manual | Manual |
| Context window | 32K | 262K | 262K |
| Multimodal | ❌ Text only | ✅ Vision | ✅ Vision |
| Architecture | Hybrid (GDN + MoE) | Dense | Dense |
LFM2 wins on raw inference speed and has turnkey MCP integration via LocalCowork. Qwen3.5 wins on context length, multimodal support, and quality-per-GB for general tasks. For a pure tool-calling agent workflow where latency matters, LFM2-24B-A2B is the faster choice.
Why This Matters for Privacy-Sensitive Workflows
The cloud-based alternative to LocalCowork — Claude Desktop, GPT-4 with tools, Gemini agents — sends your files, file paths, clipboard contents, and API key names to third-party servers. For developers working with credentials, medical records, legal documents, or financial data, that is a real exposure risk.
LocalCowork runs zero network requests after setup. Every tool call, every file read, every audit log entry stays on your machine. Liquid AI built this as an enterprise-grade privacy model from the ground up — not as an afterthought.
The r/LocalLLaMA community thread from March 4 put it plainly: models are getting smaller and faster, not larger. LFM2-24B-A2B decoding at 112 tok/s with 2B active parameters is a concrete proof point. The gap between local and cloud AI is closing faster than most expected (r/LocalLLaMA, March 2026).
FAQ
What RAM do I need to run LFM2-24B-A2B on a Mac?
You need at least 16 GB of unified memory, but 24 GB is recommended. The model uses ~14.5 GB in Q4_K_M quantization, leaving little room for system overhead on a 16 GB machine. MacBook Pro M4 Pro with 24 GB is the sweet spot.
Can I use Ollama instead of llama-server?
Not directly for LocalCowork — the app uses the llama-server API format. However, you can load the GGUF into LM Studio and use its local server endpoint as an alternative if you prefer a GUI.
How accurate is LFM2-24B-A2B at tool use?
On the curated 20-tool set tested by Liquid AI, single-step tool dispatch hits 80%+ accuracy. Multi-step chains (3-6 steps) complete end-to-end about 26% of the time. For best results, use it as a fast dispatcher in a human-supervised loop rather than a fully autonomous agent.
Does LFM2 support vision or images?
No. LFM2-24B-A2B is a text-only model. It does not have a vision encoder. If you need multimodal capabilities alongside tool use, look at Qwen3.5-9B or the Qwen3.5 4B, both of which have native vision support.
Where can I find LocalCowork's source code?
The full source is at github.com/Liquid4All/cookbook. It's Apache-licensed and designed to be extended with custom MCP servers.
---
Published March 9, 2026. LFM2-24B-A2B and LocalCowork are available now. Resources:Have questions? Reach out on X/Twitter