Hey there, fellow machine learning enthusiasts! If you’re working on a Retrieval-Augmented Generation (RAG) pipeline, you might have stumbled upon a common issue: inconsistent results when using a hybrid approach for document retrieval. In this post, I’ll share my experience and seek your advice on how to improve the performance of a hybrid KNN + keyword matching retrieval in OpenSearch.
My setup involves using OpenSearch for document retrieval and an LLM-based reranker. The retriever combines KNN vector search (dense embeddings) with multi-match keyword search (BM25) on title, heading, and text fields. Both methods are combined in a bool query with should clauses, allowing results to come from either method. Then, I rerank them with an LLM.
The problem is that even when I pull hundreds of candidates, the performance is hit or miss. Sometimes the right passage comes out on top, while other times it’s buried deep or missed entirely. This inconsistency makes final answers unreliable.
I’ve tried various tweaks to improve the performance, including increasing KNN k and BM25 candidate counts, adjusting weights between keyword and vector matches, and prompt tweaks for the reranker. However, I’m still struggling to achieve consistent results.
If you’ve faced a similar issue, I’d love to hear your advice on how to tune OpenSearch for better recall with hybrid KNN + BM25 retrieval, balance lexical vs. vector scoring in a should query, and ensure the reranker consistently sees the correct passages in its candidate set.
Let’s work together to overcome this hit-or-miss issue and create a more reliable RAG pipeline!