Configurable Safety Tuning of Language Models with Synthetic Preference Data

Dokumentdetails

ID

oai:arXiv.org:2404.00495

Thema

Computer Science - Computation and... Computer Science - Artificial Inte...

Autor

Gallego, Victor

Kategorie

Computer Science

Jahr

2024

Auflistungsdatum

03.04.2024

Schlüsselwörter

data preference cst dpo configurable language safety

Metrisch

Zusammenfassung

State-of-the-art language model fine-tuning techniques, such as Direct Preference Optimization (DPO), restrict user control by hard-coding predefined behaviors into the model.

To address this, we propose a novel method, Configurable Safety Tuning (CST), that augments DPO using synthetic preference data to facilitate flexible safety configuration of LLMs at inference time.

CST overcomes the constraints of vanilla DPO by introducing a system prompt specifying safety configurations, enabling LLM deployers to disable/enable safety preferences based on their need, just changing the system prompt.

Our experimental evaluations indicate that CST successfully manages different safety configurations and retains the original functionality of LLMs, showing it is a robust method for configurable deployment.

Data and models available at https://github.com/vicgalle/configurable-safety-tuning

Gallego, Victor, 2024, Configurable Safety Tuning of Language Models with Synthetic Preference Data

Dokumentieren

Öffnen

Quelle

Artikel empfohlen von ES/IODE AI

Computer Science

Partially explicit splitting scheme with explicit-implicit-null method for nonlinear multiscale flow problems

nonlinear method multiscale

AJNR: American Journal of Ne...

A Novel MR Imaging Sequence of 3D-ZOOMit Real Inversion-Recovery Imaging Improves Endolymphatic Hydrops Detection in Patients with Ménière Disease

ménière disease p < detection imaging sequences 3d-zoomit 3d endolymphatic real tse reconstruction ir inversion-recovery hydrops ratio

Medicine & Public Health

Successful omental flap coverage repair of a rectovaginal fistula after low anterior resection: a case report

rectovaginal fistula rectal cancer low anterior resection omental flap muscle flap rectal cancer pod initial repair rvf flap omental lar coverage