Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders

Document detail

ID

oai:arXiv.org:2411.02124

Topic

Computer Science - Machine Learnin... Computer Science - Artificial Inte...

Author

Ayonrinde, Kola

Year

2024

listing date

11/13/2024

Keywords

token solve constraint matches tokens sparse choice features saes

Metrics

Abstract

Sparse autoencoders (SAEs) are a promising approach to extracting features from neural networks, enabling model interpretability as well as causal interventions on model internals.

SAEs generate sparse feature representations using a sparsifying activation function that implicitly defines a set of token-feature matches.

We frame the token-feature matching as a resource allocation problem constrained by a total sparsity upper bound.

For example, TopK SAEs solve this allocation problem with the additional constraint that each token matches with at most $k$ features.

In TopK SAEs, the $k$ active features per token constraint is the same across tokens, despite some tokens being more difficult to reconstruct than others.

To address this limitation, we propose two novel SAE variants, Feature Choice SAEs and Mutual Choice SAEs, which each allow for a variable number of active features per token.

Feature Choice SAEs solve the sparsity allocation problem under the additional constraint that each feature matches with at most $m$ tokens.

Mutual Choice SAEs solve the unrestricted allocation problem where the total sparsity budget can be allocated freely between tokens and features.

Additionally, we introduce a new auxiliary loss function, $\mathtt{aux\_zipf\_loss}$, which generalises the $\mathtt{aux\_k\_loss}$ to mitigate dead and underutilised features.

Our methods result in SAEs with fewer dead features and improved reconstruction loss at equivalent sparsity levels as a result of the inherent adaptive computation.

More accurate and scalable feature extraction methods provide a path towards better understanding and more precise control of foundation models.

;Comment: 10 pages (18 w/ appendices), 7 figures.

Preprint

Ayonrinde, Kola, 2024, Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders

Document

Open

Source

Articles recommended by ES/IODE AI

Computer Science

Optimal Sensing of Momentum Kicks with a Feedback-Controlled Nanomechanical Resonator

sensing momentum kicks resonator

BMJ Neurology Open

Shared genetic aetiology of Alzheimer’s disease and age-related macular degeneration by APOC1 and APOE genes

gene diagnostic based analysis amd ad apoe apoc1 identified genes pleiotropy

medrxiv

The influence of home versus clinic anal human papillomavirus sampling on high-resolution anoscopy uptake in the Prevent Anal Cancer Self-Swab Study

cancer sampling =0 0 race lower hra participants anal home clinic hiv attendance arm versus