The Elephant in the Room: Making Large Language Models More Transparent | Ranjan Kumar

Large language models (LLMs) are revolutionizing industries like healthcare, legal, and finance. However, their black-box nature raises significant concerns about compliance and accountability. Current RAG frameworks show top-k retrieved documents, but they fail to reveal which parts of the output are influenced by each document. This lack of transparency is a major obstacle to widespread adoption.

The challenge is to develop interpretability techniques that can reveal causal links between input and output without requiring access to model internals. This is especially difficult since most production systems use closed APIs.

My hypothesis is that we can build a black-box attribution pipeline for closed APIs that maps sentences to supporting sources, flags hallucinations, and approximates model attention – all without model internals. The long-term vision is to achieve true token-level tracing with open-source/self-hosted models once we can access model internals directly.

I’m exploring a technical approach that involves building a black-box attribution pipeline that works with closed APIs, sentence-level source mapping without internal access, attention approximation from API responses only, and hallucination detection and flagging.

Is this the right approach? What am I missing? I’d love to hear your thoughts and feedback on this critical issue.

Leave a Comment Cancel Reply