Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models

Documentdetail

ID kaart

oai:arXiv.org:2410.11654

Onderwerp

Computer Science - Computation and...

Auteur

Vo, James

Categorie

Computer Science

Jaar

2024

vermelding datum

23-10-2024

Trefwoorden

model upscaling language models tli

Metriek

Beschrijving

In this paper, we propose Transformer Layer Injection (TLI), a novel method for efficiently upscaling large language models (LLMs) while minimizing computational costs and maintaining model performance.

Model scale is a key factor in enhancing the quality of machine learning models, and TLI addresses the challenge of scaling by reducing initial loss, minimizing fine-tuning requirements, and preserving model complexity.

Our approach improves upon the conventional Depth Up-Scaling (DUS) technique by injecting new layers into every set of K layers, enabling hidden representations to pass through transformer blocks with minimal disruption.

We compare TLI with existing approaches, including Mixture of Experts (MoE) and DUS, and validate its efficiency through experiments on small LLMs (LLama3 1B, 3B, and 8B).

Results show that TLI achieves better initialization, requires fewer training steps, and delivers superior accuracy on tasks such as KoBEST and KMCQA, with models performing effectively even without additional training.

TLI is demonstrated to be both data-efficient and cost-effective, significantly outperforming existing methods.

Its scalability and simplicity make it a promising solution for upscaling transformer-based models, with potential applications in scaling models from 10B to 405B parameters.

Vo, James, 2024, Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models

Document

Openen

Bron

Artikelen aanbevolen door ES/IODE AI

Computer Science

Choice Between Partial Trajectories: Disentangling Goals from Beliefs

agents models aligned based bootstrapped learning reward function model return choice choices partial

BMJ Neurology Open

Adverse and serious adverse events incidence of pharmacological interventions for managing chronic and episodic migraine in adults: a systematic review

aes medications bta incidence migraine drugs disorders

EClinicalMedicine

Gonadotropin Releasing Hormone agonist (GnRHa) during chemotherapy and post-cancer childbirths – a Nationwide population-based cohort study of 24,922 women diagnosed with cancer in Sweden

rates post-cancer diagnosed study associated nc register infertility dnr cancer 95% co-treatment ahr ci childbirth women