๐ก After working for almost an year on ๐ฅ๐๐ (๐ฅ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐๐๐ด๐บ๐ฒ๐ป๐๐ฒ๐ฑ ๐๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ถ๐ผ๐ป), here are my thoughts:
1๏ธโฃ Like any ML problem, you will never get good results the first time you try. Itโs an iterative process.
2๏ธโฃ A whole lot depends on your chunking strategies. This determines whether your ‘chunk’ has piece of information you need to answer the question appropriately.
3๏ธโฃ There is a trade-off between max tokens which your embedding models can take, risk of loosing context, top k documents which you want to retrieve and pass on to LLM and the context length of your LLM. These are sort of hyper-parameters which you need to tune.
4๏ธโฃ Most of the frameworks for evaluation fall short of expectations. Keep an eye on whatโs important to you as a metric. It may be well worth to do a A/B testing with beta users.
5๏ธโฃ Try different distance measures : Cosine, Euclidean, Dot and see what works best for your case.
6๏ธโฃ If output of RAG is fed synchronously to a system – Keep an eye on latency. LLM inference, and Vector search should be within your SLA.
7๏ธโฃ Choose an appropriate refresh strategy for Vector DB if your Knowledge base is continuously growing.
8๏ธโฃ Keep an eye on the cost. If your problem can be solved by simpler approaches, adopt those. Analogy which comes to mind “Do not bring gun to a knife fight” ๐
What are your thoughts? What challenges you have faced in RAG projects?