Document detail
ID

oai:arXiv.org:2403.18506

Topic
Computer Science - Machine Learnin... Computer Science - Artificial Inte...
Author
Kenneweg, Philip Galli, Leonardo Kenneweg, Tristan Hammer, Barbara
Category

Computer Science

Year

2024

listing date

4/3/2024

Keywords
optimizer search line methods
Metrics

Abstract

Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2].

In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing.

More specifically, we combine the Armijo line search with the Adam optimizer and extend it by subdividing the networks architecture into sensible units and perform the line search separately on these local units.

Our optimization method outperforms the traditional Adam optimizer and achieves significant performance improvements for small data sets or small training budgets, while performing equal or better for other tested cases.

Our work is publicly available as a python package, which provides a hyperparameter-free pytorch optimizer that is compatible with arbitrary network architectures.

Kenneweg, Philip,Galli, Leonardo,Kenneweg, Tristan,Hammer, Barbara, 2024, Faster Convergence for Transformer Fine-tuning with Line Search Methods

Document

Open

Share

Source

Articles recommended by ES/IODE AI

A Novel MR Imaging Sequence of 3D-ZOOMit Real Inversion-Recovery Imaging Improves Endolymphatic Hydrops Detection in Patients with Ménière Disease
ménière disease p < detection imaging sequences 3d-zoomit 3d endolymphatic real tse reconstruction ir inversion-recovery hydrops ratio
Successful omental flap coverage repair of a rectovaginal fistula after low anterior resection: a case report
rectovaginal fistula rectal cancer low anterior resection omental flap muscle flap rectal cancer pod initial repair rvf flap omental lar coverage