Détail du document
Identifiant

oai:arXiv.org:2410.03810

Sujet
Computer Science - Machine Learnin... Computer Science - Artificial Inte... Computer Science - Computation and...
Auteur
Ren, Ruifeng Li, Zhicong Liu, Yong
Catégorie

Computer Science

Année

2024

Date de référencement

09/10/2024

Mots clés
dp sequence size transformers
Métrique

Résumé

Transformers have been the cornerstone of current Large Language Models (LLMs); however, its linear growth in overhead during inference with respect to sequence length poses challenges for modeling long sequences.

In this context, Mamba has gradually attracted attention due to its constant-level size during inference and existing empirical results have shown that it can perform comparably to Transformers in sequence modeling while offering significant savings.

However, one may ask that, can Mamba always enjoy the ``free lunch"?

In this paper, we focus on analyzing the expressive ability of Mamba from a theoretical standpoint.

First, inspired by the connection between Mamba and linear attention, we investigate potential shortcomings of the Mamba when performing the COPY operation.

Our results indicate that Mamba with constant size may encounter bottlenecks when handling COPY, while it can achieve perfect performance when the size scales linearly with sequence length.

Based on this observation, we analyze Mamba's ability to tackle DP problems when equipped with Chain of Thought (CoT).

Our findings suggest that to solve arbitrary DP problems, the total cost of Mamba is comparable to standard and efficient Transformers.

However, similar to efficient Transformers, when facing DP problems with favorable properties such as locality, Mamba can provide savings in overhead.

Our results contribute to a deeper understanding of Mamba.

Ren, Ruifeng,Li, Zhicong,Liu, Yong, 2024, Can Mamba Always Enjoy the "Free Lunch"?

Document

Ouvrir

Partager

Source

Articles recommandés par ES/IODE IA

Skin cancer prevention behaviors, beliefs, distress, and worry among hispanics in Florida and Puerto Rico
skin cancer hispanic/latino prevention behaviors protection motivation theory florida puerto rico variables rico psychosocial behavior response efficacy levels skin cancer participants prevention behaviors spanish-preferring tampeños puerto hispanics