Say Goodbye to Bloated Indexes: Introducing LEANN, the Local RAG with a 97% Smaller Index | Ranjan Kumar

Imagine being able to run semantic search on your laptop, fast, lightweight, and cloud-free. That’s exactly what LEANN, a local vector index for RAG, promises to deliver. Developed at Berkeley Sky Lab, LEANN is designed to be privacy-first, 97% smaller, and fully compatible with Claude Code, Ollama, and GPT-OSS.

The current approach to vector databases involves storing everything, which quickly balloons to over 100 GB when indexing emails, chat, and code. But most queries only touch a tiny slice of the database. So, the team behind LEANN asked, why store every single embedding?

LEANN introduces two ultra-lightweight backends: a graph-only mode that stores no embeddings, just a pruned HNSW graph, and a PQ+Rerank mode that compresses vectors with PQ and replaces heavy storage with lightweight recomputation over the candidate set. Both achieve massive storage savings with no meaningful drop in recall.

LEANN supports semantic search over Apple Mail, Filesystem, Chrome/Chat history, and Codebase, making it your personal Jarvis, running locally. If you’re interested in trying it out, giving feedback, or asking questions, head over to the GitHub repo.

What do you think about the potential of LEANN to revolutionize the way we approach semantic search?

Leave a Comment Cancel Reply