Have you ever wondered how our brains process and store information? Researchers have been inspired by the workings of associative memory to develop a novel approach to in-context learning, achieving remarkable improvements in efficiency and robustness. This breakthrough is made possible by the introduction of ‘residual attention streams’ in Transformers, which enables information flow pathways that better retain prior context.
The Associative Memory for In-Context Learning (AMICL) algorithm works in three steps: identifying incomplete patterns, searching for similar complete patterns in context, and completing the pattern using the best match. This approach has achieved near-perfect performance on classification tasks.
The residual attention streams, inspired by AMICL, create direct connections between attention head values across layers. This innovation has led to a 24% faster convergence to 95% accuracy in two-layer Transformers on toy tasks, a 6-fold improvement on Indirect Object Identification tasks, and even showed improvements on 1B parameter models.
What’s exciting is that this approach enhances in-context learning efficiency and robustness without increasing the parameter count, making it computationally efficient. From a safety perspective, this means AI systems can more reliably understand and follow instructions from context rather than falling back on potentially problematic patterns from training data.
The parallels to biological memory systems are also fascinating, with the hippocampus having selective skip connections that may serve similar computational functions to AMICL and the architectural modifications introduced here.
Possible future directions include parameterized residual streams, alternative attention head connection patterns, scaling to larger architectures, and applications beyond NLP.
If you’re interested in learning more, be sure to check out the paper and code linked below.