Significant Advancement In Long-Context AI

Google Research has introduced two new research papers, Titans and MIRAS, aimed at addressing a growing limitation in modern AI systems: handling very long stretches of information without slowing down or losing important context. Together, Titans and MIRAS, focus on giving models a structured way to retain what matters over time, allowing them to follow extended documents, conversations, or data streams with greater continuity.

The Titans Architecture

Titans is model family that uses a Long-Term Memory module that actively learns as it processes data using a “surprise metric.”

The surprise metric is an internal error flag, a mathematical way of signaling, “This is unexpected!” This signal measures the difference between what the model currently remembers and what the new incoming data is telling it. It signals when information is unexpected or important enough to be prioritized for long-term storage.

To make this effective, the architecture uses what’s known as momentum, a sustained focus, to determine how much of the surrounding long sequences of data it actually records. This ensures the model continues to prioritize relevant details that follow that initial flag even if those subsequent details are not individually surprising.

Lastly, the Titans architecture uses an adapting forgetting mechanism, a mathematical way of gradually clearing out old or less useful information. This ensures that as the model processes long sequences of data, it can let go of outdated details to make room for new, more relevant information.

By combining these three elements, the surprise metric (what to notice), momentum (how much to record), and weight decay (what to forget), the Titans architecture creates a memory system that stays sharp and relevant regardless of how much data it processes.

The MIRAS Framework

While Titans is a specific model family, MIRAS is a framework for designing sequence models. It reconceptualizes these architectures as associative memory, modules that learn to associate specific data points with one another using an internal objective that tells the memory module “how” to learn the relationship between different pieces of data.

To build a model within this framework, designers make four core choices:

Memory Structure: The physical architecture of the memory itself, which can range from simple vectors to the deep MLP layers used in Titans.
Attentional Bias: The specific internal objective that determines how the memory prioritizes and links incoming information.
Memory Stability and Retention: The mechanism that balances learning new information with retaining the past state.
Memory Algorithm: The learning method used to update the memory, such as the gradient descent methods that allow the model to learn at test time.

The Problem: AI Can Process, But It Struggles To Remember

Modern AI models are effective at analyzing the information directly in front of them. The challenge begins as context grows very large. As documents, datasets, or conversations stretch longer, models face a tradeoff between preserving detail and keeping computational cost manageable.

Modern language models typically handle long context in one of two ways:

Attention Window
They revisit earlier text directly when needed, repeatedly looking back at prior tokens to decide what matters for the current step.
State Compression
They compress what came before into a smaller internal summary so they can keep moving forward, trading detail for efficiency.

Both approaches work, but each begins to break down as inputs grow longer. With attention window, repeatedly revisiting earlier material becomes increasingly demanding in computational resources, while with state compression, compressing what came before risks losing details that later turn out to matter.

The limitation is not scale or speed, it is memory. Current systems do not treat memory as something that can be deliberately managed during use. Instead, they rely on fixed architectural patterns, either scanning backward or compressing forward, without a structured way to decide what should be retained over long spans.

Titans and MIRAS approach that problem by treating memory as something models can actively manage rather than passively inherit from their architecture.

Why The Research Is Presented In Two Parts

Addressing this limitation requires more than a single technical change. One step is to show that models can actually manage memory differently in practice. Another is to develop a way to design such systems deliberately rather than treating each new architecture as a one-off solution.

The two papers reflect those needs:

One introduces a concrete method for giving models a form of long-term memory.
The other provides a framework for understanding and building models around that idea.

Titans: Adding A Form Of Long-Term Memory

Titans focuses on the practical side of the problem. It introduces an architecture that enables a model to accumulate information as it operates. Rather than repeatedly reprocessing earlier input or compressing everything into a small representation, the model can carry forward selected information over time.

Unlike traditional systems that use a simple, fixed-size summary, this module is a deep neural network that can capture much more complex and detailed information.

The goal is to make it possible to work with very long inputs without repeatedly scanning the past or losing key details. Titans is not presented as a replacement for existing model designs. It is an additional layer that can be combined with them, extending how they handle context rather than discarding what already works.

MIRAS: A Framework For Designing Memory-Driven Models

Where Titans introduces a specific mechanism, MIRAS steps back and looks at the broader design question. It treats sequence models as systems that store and update associations over time and proposes a structured way to think about how that memory should function.

Instead of viewing architectures as fundamentally different categories, MIRAS organizes them around a small set of design choices related to how information is stored, matched, updated, and retained.

MIRAS provides a way to interpret systems like Titans and develop new ones without starting from scratch.

Testing Whether This Approach Improves Long-Context Handling

To determine if this memory-based approach translates into a practical advantage, the researchers evaluated it against existing designs on tasks where context spans are extremely long.

In long-context evaluations, Titans scaled beyond 2 million tokens while maintaining higher retrieval accuracy than the baseline models tested. In the BABILong benchmark, which requires reasoning across facts buried in massive documents, Titans outperformed much larger models, including GPT-4, despite having significantly fewer parameters.

The MIRAS paper further demonstrates that this success is not limited to a single model. By testing several different systems built using its framework, the researchers showed that these design principles consistently produce high-performing results across different tasks.

Together, these evaluations show that structured, active memory enables models to maintain high precision across massive datasets without the usual trade-off in computational cost.

The Titans researchers explained their results:

“Our experimental evaluation on diverse tasks tasks validate that Titans are more effective than Transformers and recent modern linear recurrent models, specifically for
long context. That is, Titans can scale to larger than 2M context window size with better accuracy than baselines.”

The MIRAS researchers explain why MIRAS represents an advancement:

“In this paper, we present Miras, a general framework that explains the connection of online optimization and test time memorization. Miras framework can explain the role of several standard architectural choices in the literature (e.g., forget gate) and helps design next generation of architectures that are capable of managing the memory better.

Building upon our framework, we present three novel sequence models, each of which with its own (dis)advantages. Our experimental evaluations show that all these variants are more powerful than Transformers and linear RNNs, in various downstream tasks. In this work, we present a diverse set of variants using Miras.

In future, exploring these alternative architectures for different downstream tasks is an interesting future direction.”

Researchers’ Conclusions

The Titans paper (PDF) concludes that combining short-range processing with a dedicated long-term memory can improve how models handle extended inputs without relying solely on larger attention windows or more aggressive compression. It presents this as an additional capability that can be integrated with existing architectures rather than a replacement for them.

The MIRAS paper describes sequence models as memory-driven systems that can be designed and compared more systematically. Its framework is intended to guide how such models are constructed by making memory behavior an explicit design dimension.

Both papers treat memory as something models can manage deliberately: Titans by adding a mechanism that can store information during use, and MIRAS by laying out a framework for designing and comparing memory-driven models.

Google’s blog post explains what makes Titans and MIRAS important:

“The introduction of Titans and the MIRAS framework marks a significant advancement in sequence modeling. By employing deep neural networks as memory modules that learn to memorize as data is coming in, these approaches overcome the limitations of fixed-size recurrent states.

Furthermore, MIRAS provides a powerful theoretical unification, revealing the connection between online optimization, associative memory, and architectural design. By moving beyond the standard Euclidean paradigm, this research opens the door to a new generation of sequence models that combine the efficiency of RNNs with the expressive power needed for the era of long-context AI.”

Together, they demonstrate that the path to better long-context performance is not just about larger windows or bigger models, but about giving AI a structured way to manage what it remembers.

Featured Image by Shutterstock/AntonKhrupinArt

Significant Advancement In Long-Context AI

Meta introduces click and engage-through attribution updates

5x the Pages, 70x the Citations, 1615x the Traffic

How to revise your old content for AI search optimization

Meta introduces click and engage-through attribution updates

How to Prevent Your Smartwatch Band From Irritating Your Skin

Quantum-Resistant Data Diode Secures Data on Edge Devices

I ditched my gas generator for battery backup, and I’m never looking back

Our Picks

Meta introduces click and engage-through attribution updates

How to Prevent Your Smartwatch Band From Irritating Your Skin

Quantum-Resistant Data Diode Secures Data on Edge Devices

Significant Advancement In Long-Context AI

The Titans Architecture

The MIRAS Framework

The Problem: AI Can Process, But It Struggles To Remember

Why The Research Is Presented In Two Parts

Titans: Adding A Form Of Long-Term Memory

MIRAS: A Framework For Designing Memory-Driven Models

Testing Whether This Approach Improves Long-Context Handling

Researchers’ Conclusions

Related Posts