Hey there, fellow machine learning enthusiasts! I came across an interesting problem while working on a Retrieval-Augmented Generation (RAG) pipeline using OpenSearch for document retrieval and an LLM-based reranker. The retriever uses a hybrid approach, combining KNN vector search with dense embeddings and multi-match keyword search using BM25 on title, heading, and text fields. Both methods are combined in a bool query with should clauses, allowing results to come from either method, and then reranked with an LLM.
The issue I’m facing is that even when I pull hundreds of candidates, the performance is hit or miss. Sometimes the right passage comes out on top, while other times it’s buried deep or missed entirely. This inconsistency makes final answers unreliable.
I’ve tried various tweaks to improve the performance, including increasing KNN k and BM25 candidate counts, adjusting weights between keyword and vector matches, prompt tweaks for the reranker to focus on relevance, and query reformulation for keyword search.
If you’ve experienced similar issues with hybrid retrieval and reranking, I’d love to hear your advice on tuning OpenSearch for better recall, balancing lexical vs. vector scoring in a should query, ensuring the reranker consistently sees the correct passages, and improving reranker performance without full fine-tuning.
Let’s work together to make these models more consistent and reliable!