Integrating Memories in Deep Learning Models: Studies in Continual Learning and Task Generalization

Agrawal, Susmit

Integrating Memories in Deep Learning Models: Studies in Continual Learning and Task Generalization

Susmit Agrawal

Abstract

Biological life forms remember the past to survive the future, flexibly weaving long-term knowledge with moment-to-moment context. This thesis argues that memory, in the above view, is the missing organisational principle in contemporary deep learning, showing that a unified view of how and where information is structured, stored, protected, and retrieved can simultaneously (i) curb privacy-threatening memorisation in large language models, (ii) preserve interpretable concept structure in continual vision learners, (iii) biologically ground parameter-efficient neuromodulation, and (iv) deliver a single associative-memory-driven framework that handles class-incremental learning, domain generalisation, and domain-incremental learning. We first develop an attribution-based analysis revealing that memorisation in transformers is concentrated in late attention blocks and can be exercised with "short-circuit" interventions that leave reasoning intact. We then introduce MuCIL, which stores multimodal concept embeddings in a memory bank and maintains concept–class relations over experiences. Next, we show that Modern Hopfield and Predictive-Coding memories can recall LoRA adapters on demand, providing a neuro-inspired mechanism for task-specific modulation. We importantly show that associative memories can potentially store modulatory signals with very high fidelity, rivaling computer-based digital memories when used for recalling model weights. Finally, we present MIRA, embedding Hopfield recall in every ViT layer to retrieve and compose adapters, unifying DG, CIL and DIL in one architecture that attains state-of-the-art accuracy with minimal forgetting. Together, these four studies chart a continuum from unintended implicit memory to deliberately engineered explicit memory, laying out both theoretical bounds and practical designs for memory-centric deep learning.