Extended Mind Transformers (EMTs) are a new approach to working with very large contexts and external data sources developed by @KlettPhoebe, @thomasahle, Normal's AI team. Inspired by the Extended Mind Thesis, we modify Multihead Attention to directly query a vector database.
5
41
211
314K
190
Download Image
Our method outperforms Retrieval Augmented Generation, RAG, on long-range retrieval tasks. Where RAG only does one query to the vector database per prompt, EMTs do one query for every layer in the transformer. This is a bit slower, but results in much better performance.
@NormalComputing Is there a plausible reason why RAG does better for doc length of 2k?