Home / Research Library / Mamba: Linear-Time Sequence Modeling with Selectiv...
🤖 Artificial Intelligence OpenAlex

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

📅 December 1, 2023 👤 Albert Gu, Tri Dao 📖 arXiv (Cornell University) 📊 1,001 citations

🤖 Plain-English Summary

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. As a general sequence model backbone, Mamba achieves advanced performance across several modalities such as language, audio, and genomics.

🔑 Key Findings

  • Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language.
  • We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements.
  • First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Dec 01, 2023
Journal arXiv (Cornell University)
Authors Albert Gu, Tri Dao
DOI 10.48550/arxiv.2312.00752
Citations 1,001
Source OpenAlex

More 🤖 Artificial Intelligence Research