Previous Flow: Message (User) -> Response (Model) -> .... RAG Flow: Message (User) -> Retrieve Additional Context (Vector Database) -> Response (Model) A model hallucinates. A model isn't capable of reading a ton of documentation in one context run. For any specific, project, person, institution or group, they might have a series of documentation, books, reports and data of interest to them specifically, that they want their LLMs to be aware of. Models will never be able to have infinite context. A person might want a reference to a primary source in an engineering or professional environment, an LLM might need more context to better answer a question. There are a few forms of memory available to a model. One is the information retained during pre-training. There's then information from the immediate context in a message. Furthermore, we can have external memory, akin to a person writing things down or taking notes. This can be a vector database, most commonly used today, to find the most similar piece of text to the message. We implemented this with LLAMA-Index's library on the cloud with a very simple api. Secondly, is using a knowledge tree. A tree with clone nodes can represent anything a graph can. A knowledge tree allows for log traversals, and is truly the most intuitive and natural way to express information, a tree of knowledge. It utilizes an agents COT during traversal and construction and accounts for the models short term memory loss while still enabling it to work with a lot of data (Infinite if implemented correctly). We ran into the issue of maintaining the tree as it grows, but implemented some stuff. This is a popular idea, seen done most similarly by Arcus.