detalle del documento
IDENTIFICACIÓN

oai:arXiv.org:2410.23912

Tema
Computer Science - Artificial Inte... Computer Science - Machine Learnin...
Autor
Chang, Fu-Chieh Lee, Yu-Ting Shih, Hui-Ying Tseng, Yi Hsuan Wu, Pei-Yuan
Categoría

Computer Science

Año

2024

fecha de cotización

16/4/2025

Palabras clave
cot framework star
Métrico

Resumen

The reasoning abilities of large language models (LLMs) have improved with chain-of-thought (CoT) prompting, allowing models to solve complex tasks stepwise.

However, training CoT capabilities requires detailed reasoning data, which is often scarce.

The self-taught reasoner (STaR) framework addresses this by using reinforcement learning to automatically generate reasoning steps, reducing reliance on human-labeled data.

Although STaR and its variants have demonstrated empirical success, a theoretical foundation explaining these improvements is lacking.

This work provides a theoretical framework for understanding the effectiveness of reinforcement learning on CoT reasoning and STaR.

Our contributions are: (1) criteria for the quality of pre-trained models necessary to initiate effective reasoning improvement; (2) an analysis of policy improvement, showing why LLM reasoning improves iteratively with STaR; (3) conditions for convergence to an optimal reasoning policy; and (4) an examination of STaR's robustness, explaining how it can improve reasoning even when incorporating occasional incorrect steps; This framework aims to bridge empirical findings with theoretical insights, advancing reinforcement learning approaches for reasoning in LLMs.

Chang, Fu-Chieh,Lee, Yu-Ting,Shih, Hui-Ying,Tseng, Yi Hsuan,Wu, Pei-Yuan, 2024, RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner

Documento

Abrir

Compartir

Fuente

Artículos recomendados por ES/IODE IA

Clinical Relevance of Plaque Distribution for Basilar Artery Stenosis
study endovascular imaging wall basilar complications plaque postoperative artery plaques stenosis