LitBot: Document-Based Question Answering With RAG -

In my current project, I am excited to be developing a chatbot that helps users interact with their documents. Users can ask the chatbot questions about their texts and obtain answers. Additionally, the chatbot should provide evidence through relevant passages.

I am implementing this idea using Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs). This approach includes two main components: query-based passage retrieval followed by reranking and answer generation via decoders. I am eager to achieve the following goals in this project:

– The first goal I am currently working on is to develop a prototype that runs on different notebooks with limited resources. It is implemented with microservices running in Docker containers, features a Gradio interface, and uses the LangChain and LangGraph libraries to build workflows based on LLMs.

– To fine-tune the retrieval component and evaluate the complete pipeline, question-answer-passage triples are needed. Since only documents are available, I plan to perform data augmentation with LLMs. I believe some manual effort will also be required to ensure the quality of the generated data.

– If the number of generated and manually verified question-answer pairs is small, I will perform task augmentation by including similar datasets that share structure. So, I could apply one of the meta-learning approaches from the excellent lecture „Deep Multi-Task and Meta Learning“ by Prof. Chelsea Finn at Stanford University.

– Another important aspect to consider is selecting appropriate metrics for evaluating generated answers. This is crucial because generated answers may have the same meaning as ground truth answers but be formulated with different words.