As I delve into the world of environmental consulting reports, I’m faced with a crucial question: how can I go beyond mere metadata analysis and uncover deeper insights from these complex documents? I’ve built a robust RAG (Retrieve, Analyze, Generate) app that ingests PDFs, generates custom metadata, and employs hybrid search with Elasticsearch and embedded vectors for reranking. But I want more.
I have two pressing questions:
1. How to ask open-ended questions and ensure exhaustive document analysis?
I want to be able to ask questions that aren’t necessarily answerable via the metadata I’ve processed and stored. To achieve this, I’m considering setting an extremely high limit on chunk retrieval and iteratively checking each chunk to collect answers. But is this the most efficient approach?
2. How to facilitate comprehensive analysis of multiple reports?
I have two reports on the same subject, written by different experts. I want to analyze where they agree and disagree. My idea is to identify ‘key claims’ as structured metadata during ingestion, then iterate through each key claim in each report, and hybrid search the other report to retrieve potentially relevant chunks. Finally, I’d classify the results as agree, disagree, or neutral, and ask a Large Language Model (LLM) to dedupe and summarize the findings.
I’d love to hear your thoughts on these approaches and any alternative methods you’d suggest.
By unlocking deeper insights from environmental reports, we can make more informed decisions and drive positive change. Let’s explore the possibilities together!