Documentdetail
ID kaart

oai:arXiv.org:2410.11055

Onderwerp
Computer Science - Computation and... Computer Science - Artificial Inte...
Auteur
Yao, Jihan Ding, Wenxuan Feng, Shangbin Wang, Lucy Lu Tsvetkov, Yulia
Categorie

Computer Science

Jaar

2024

vermelding datum

23-10-2024

Trefwoorden
answers wrong-over-wrong wrong preferences
Metriek

Beschrijving

In the absence of abundant reliable annotations for challenging tasks and contexts, how can we expand the frontier of LLM capabilities with potentially wrong answers?

We focus on two research questions: (1) Can LLMs generate reliable preferences among wrong options?

And if so, (2) Would alignment with such wrong-over-wrong preferences be helpful?

We employ methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences, and fine-tune language models with preference optimization approaches using these synthesized preferences.

Extensive experiments with seven LLMs and eight datasets demonstrate that (1) LLMs do have preliminary capability in distinguishing various shades of wrong, achieving up to 20.9% higher performance than random guess; (2) Alignment with wrong-over-wrong preferences helps LLMs to produce less wrong and sometimes even outright correct answers, while overall improving model calibration.

Yao, Jihan,Ding, Wenxuan,Feng, Shangbin,Wang, Lucy Lu,Tsvetkov, Yulia, 2024, Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Document

Openen

Delen

Bron

Artikelen aanbevolen door ES/IODE AI

Choice Between Partial Trajectories: Disentangling Goals from Beliefs
agents models aligned based bootstrapped learning reward function model return choice choices partial