Détail du document
Identifiant

oai:arXiv.org:2410.13779

Sujet
Computer Science - Computation and... Computer Science - Machine Learnin...
Auteur
Frydenlund, Arvid
Catégorie

Computer Science

Année

2024

Date de référencement

23/10/2024

Mots clés
path-star models node task
Métrique

Résumé

The recently introduced path-star task is a minimal task designed to exemplify limitations to the abilities of language models (Bachmann and Nagarajan, 2024).

It involves a path-star graph where multiple arms radiate from a single starting node and each node is unique.

Given the start node and a specified target node that ends an arm, the task is to generate the arm containing that target node.

This is straightforward for a human but surprisingly difficult for language models, which did not outperform the random baseline.

The authors hypothesized this is due to a deficiency in teacher-forcing and the next-token prediction paradigm.

We demonstrate the task is learnable using teacher-forcing in alternative settings and that the issue is partially due to representation.

We introduce a regularization method using structured samples of the same graph but with differing target nodes, improving results across a variety of model types.

We provide RASP proofs showing the task is theoretically solvable.

Finally, we find settings where an encoder-only model can consistently solve the task.

;Comment: EMNLP 2024 Main

Frydenlund, Arvid, 2024, The Mystery of the Pathological Path-star Task for Language Models

Document

Ouvrir

Partager

Source

Articles recommandés par ES/IODE IA

Diagnostic reliability of chest CT qualitative and quantitative assessment to predict survival and morbidity in oncology patients with COVID-19 infection
covid-19 chest ct oncology patients cancer status pulmonary analysis versus cancer infection patients covid-19 oncology using ct chest statistically tss