LFM2-24B-A2B + LocalCowork: Run a Full AI Agent Locally on Your Mac (2026)

Liquid AI just shipped a working on-device AI agent — not a demo, a real one. LFM2-24B-A2B paired with LocalCowork runs 75 MCP tools entirely on your Mac: security scanning, file operations, audit logs, clipboard actions, all offline (Liquid AI blog, March 2026). The model fits in 14.5 GB of RAM and decodes at 112 tok/s on consumer hardware. LFM2-24B-A2B benchmark chart — efficiency vs quality scaling

LFM2-24B-A2B benchmark chart — efficiency vs quality scaling

LFM2 scales from 350M to 24B with log-linear quality gains — every size stays fast

TL;DR: LFM2-24B-A2B (24B total / 2B active params) powers LocalCowork, a fully offline MCP agent with 75 tools. Runs at 14.5 GB RAM on Q4_K_M GGUF, 112 tok/s on AMD CPU, 293 tok/s on H100. Single-step tool accuracy hits 80%+ on the curated 20-tool set. No internet, no API key, no data egress. Requires 16+ GB unified memory on Apple Silicon.

What Is LFM2-24B-A2B?

LFM2 is Liquid AI's family of hybrid foundation models, built for on-device inference. The 24B-A2B variant is the flagship — a Sparse Mixture-of-Experts (MoE) model that activates only 2 billion of its 24 billion parameters per token.

The underlying architecture is not a standard Transformer. Liquid AI combines Gated Delta Networks (a linear attention variant) with sparse MoE routing. This hybrid approach dramatically reduces per-token compute while keeping the model's full knowledge base intact (HuggingFace model card, March 2026).

Spec	LFM2-24B-A2B	LFM2-8B-A1B
Total parameters	24B	8.3B
Active per token	~2B	~1.5B
RAM (Q4_K_M)	~14.5 GB	~6 GB
Decode speed (AMD CPU)	112 tok/s	~180 tok/s
Decode speed (H100)	293 tok/s	~400 tok/s
Serving support	llama.cpp, vLLM, SGLang	llama.cpp, vLLM, SGLang

Day-one support for llama.cpp, vLLM, and SGLang means you can run it right now with Ollama or LM Studio — no waiting for framework updates.

What Is LocalCowork?

LocalCowork is an open-source desktop AI agent released alongside LFM2-24B-A2B on March 5, 2026. It runs fully offline using the Model Context Protocol (MCP) — the same protocol Claude and other agents use to connect to external tools, but here everything stays on your machine.

The system ships with 75 tools across 14 MCP servers. The production-ready demo focuses on a curated subset of 20 tools across 6 servers, each tested to achieve over 80% single-step accuracy (Liquid AI blog, March 2026).

What it can do out of the box:

Security scanning — finds leaked API keys, AWS credentials, and personal data buried in your folders
Audit trails — logs every tool call to a local file, generates compliance-ready reports
Document processing — OCR, contract diffing, PDF generation
File operations — list, read, search, write across the filesystem
System info — disk usage, clipboard read/write, process inspection
Cross-server chaining — clipboard output from one server feeds into another

The code is open-source and available at github.com/Liquid4All/cookbook. You can add your own MCP servers or swap the model.

How Does It Perform on Apple Silicon?

Liquid AI tested LocalCowork on an Apple M4 Max with 36 GB unified memory. On M4 Pro with the Cactus inference engine (INT8 quantization), the model achieves 229 tokens/sec prefill and 27 tokens/sec decode (Liquid AI launch blog, February 2026). That is fast enough to feel interactive for agent tasks.

For single-step tool dispatch — the most critical metric for an agent — LFM2-24B-A2B hits 80%+ accuracy on the curated 20-tool set. Multi-step chains (3-6 steps) complete end-to-end 26% of the time. Liquid AI is clear about this: the model is best used as a fast, deterministic dispatcher in a guided loop, not a hands-off autopilot for long autonomous chains.

Real-world comparison: The smaller LFM2-350M decodes at 255.7 tok/s vs Qwen3.5-0.8B at 83.4 tok/s — a 3.1× speed advantage for the LFM architecture at the same parameter scale (r/LocalLLaMA, March 2026).

Which Mac Can Run LFM2-24B-A2B?

At 14.5 GB in Q4_K_M, you need at least 16 GB unified memory. The 8 GB MacBook Air cannot run this model. Here's the full breakdown.

Device	RAM	Can Run LFM2-24B-A2B?	Speed (est.)
MacBook Air M1/M2/M3 8GB	8 GB	❌ No	—
MacBook Air M3 16GB	16 GB	⚠️ Tight (system overhead)	~20 tok/s
MacBook Pro M4 Pro 24GB	24 GB	✅ Yes	~27 tok/s decode
MacBook Pro M4 Max 36GB	36 GB	✅ Yes	~30+ tok/s
Mac Mini M4 Pro 24GB	24 GB	✅ Yes	~25 tok/s decode
Mac Studio M4 Max 64GB	64 GB	✅ Yes (with headroom)	~35+ tok/s

For MacBook Air 16 GB users, the system overhead (macOS + apps) will leave you under 16 GB free RAM. The 8B-A1B variant is a better fit at ~6 GB and delivers similar tool-calling accuracy for simpler workflows.

How to Set It Up with Ollama

LocalCowork requires llama-server (from llama.cpp) rather than Ollama directly, but the model downloads from HuggingFace with standard tools.

Step 1: Install llama.cpp (for Apple Silicon)

brew install llama.cpp

Step 2: Download the GGUF

huggingface-cli download LiquidAI/LFM2-24B-A2B-GGUF \ --include "LFM2-24B-A2B-Q4_K_M.gguf" \

--local-dir ./lfm2-24b

Step 3: Start the server

llama-server \ -m ./lfm2-24b/LFM2-24B-A2B-Q4_K_M.gguf \ --flash-attn \ -ngl 99 \

--port 8080

Step 4: Clone and run LocalCowork

git clone https://github.com/Liquid4All/cookbook cd cookbook/examples/localcowork

npm install && npm start

The app connects to llama-server on localhost and routes all tool calls through MCP — no internet connection required at any step.

LFM2 vs Qwen3.5 for Agent Tasks

Both model families target local deployment on consumer hardware. Here is how they compare for agent-specific use cases.

Metric	LFM2-24B-A2B	Qwen3.5-9B	Qwen3.5-27B
RAM (Q4)	14.5 GB	~7 GB	~18 GB
Decode (consumer CPU)	112 tok/s	~60 tok/s	~30 tok/s
Single-step tool accuracy	80%+ (curated)	—	—
MCP integration	✅ Native	Manual	Manual
Context window	32K	262K	262K
Multimodal	❌ Text only	✅ Vision	✅ Vision
Architecture	Hybrid (GDN + MoE)	Dense	Dense

LFM2 wins on raw inference speed and has turnkey MCP integration via LocalCowork. Qwen3.5 wins on context length, multimodal support, and quality-per-GB for general tasks. For a pure tool-calling agent workflow where latency matters, LFM2-24B-A2B is the faster choice.

Why This Matters for Privacy-Sensitive Workflows

The cloud-based alternative to LocalCowork — Claude Desktop, GPT-4 with tools, Gemini agents — sends your files, file paths, clipboard contents, and API key names to third-party servers. For developers working with credentials, medical records, legal documents, or financial data, that is a real exposure risk.

LocalCowork runs zero network requests after setup. Every tool call, every file read, every audit log entry stays on your machine. Liquid AI built this as an enterprise-grade privacy model from the ground up — not as an afterthought.

The r/LocalLLaMA community thread from March 4 put it plainly: models are getting smaller and faster, not larger. LFM2-24B-A2B decoding at 112 tok/s with 2B active parameters is a concrete proof point. The gap between local and cloud AI is closing faster than most expected (r/LocalLLaMA, March 2026).

FAQ

What RAM do I need to run LFM2-24B-A2B on a Mac?

You need at least 16 GB of unified memory, but 24 GB is recommended. The model uses ~14.5 GB in Q4_K_M quantization, leaving little room for system overhead on a 16 GB machine. MacBook Pro M4 Pro with 24 GB is the sweet spot.

Can I use Ollama instead of llama-server?

Not directly for LocalCowork — the app uses the llama-server API format. However, you can load the GGUF into LM Studio and use its local server endpoint as an alternative if you prefer a GUI.

How accurate is LFM2-24B-A2B at tool use?

On the curated 20-tool set tested by Liquid AI, single-step tool dispatch hits 80%+ accuracy. Multi-step chains (3-6 steps) complete end-to-end about 26% of the time. For best results, use it as a fast dispatcher in a human-supervised loop rather than a fully autonomous agent.

Does LFM2 support vision or images?

No. LFM2-24B-A2B is a text-only model. It does not have a vision encoder. If you need multimodal capabilities alongside tool use, look at Qwen3.5-9B or the Qwen3.5 4B, both of which have native vision support.

Where can I find LocalCowork's source code?

The full source is at github.com/Liquid4All/cookbook. It's Apache-licensed and designed to be extended with custom MCP servers.

---

Published March 9, 2026. LFM2-24B-A2B and LocalCowork are available now. Resources: