Hey there! As a CS student, I’m working on a project to build an AI assistant for my university using RAG (Retrieval-Augmented Generation) and possibly agentic tools later on. The goal is to create a chatbot that can help students find answers to common university-related questions and even perform light actions like form redirection.
But here’s the thing: I’m struggling to figure out what types of data I should collect and prepare to make this assistant useful, accurate, and robust. I plan to use LangChain or LlamaIndex + a vector store, but I want to hear from folks with experience in this kind of thing.
So, I’m reaching out for help. What kinds of data did you use for similar projects? How do you decide what to include or ignore? Any tips for formatting, chunking, or organizing it early on? Any advice, pointers, or even just a nudge in the right direction would be awesome.
Building a university assistant RAGbot is a complex task, and I’m not alone in this journey. If you’ve worked on a similar project or have experience with RAG, I’d love to hear about your experiences and learn from them.