Extended Mind Transformers (EMTs) are a new approach to working with very large contexts and external data sources developed by @KlettPhoebe, @thomasahle, Normal's AI team. Inspired by the Extended Mind Thesis, we modify Multihead Attention to directly query a vector database.
Our method outperforms Retrieval Augmented Generation, RAG, on long-range retrieval tasks. Where RAG only does one query to the vector database per prompt, EMTs do one query for every layer in the transformer. This is a bit slower, but results in much better performance.
By co-opting the underlying reasoning process in the LLM, we can expand its capabilities in a natural way, without requiring an unnatural "query generation" step in human language. Try it for yourself on the @huggingface hub huggingface.co/normalcomputin…