Hyperbolic Learning with Multimodal Large Language Models

detalle del documento

IDENTIFICACIÓN

oai:arXiv.org:2408.05097

Tema

Computer Science - Machine Learnin... Computer Science - Artificial Inte...

Autor

Mandica, Paolo Franco, Luca Kallidromitis, Konstantinos Petryk, Suzanne Galasso, Fabio

Categoría

Computer Science

Año

2024

fecha de cotización

14/8/2024

Palabras clave

training uncertainty euclidean learning models

Métrico

Resumen

Hyperbolic embeddings have demonstrated their effectiveness in capturing measures of uncertainty and hierarchical relationships across various deep-learning tasks, including image segmentation and active learning.

However, their application in modern vision-language models (VLMs) has been limited.

A notable exception is MERU, which leverages the hierarchical properties of hyperbolic space in the CLIP ViT-large model, consisting of hundreds of millions parameters.

In our work, we address the challenges of scaling multi-modal hyperbolic models by orders of magnitude in terms of parameters (billions) and training complexity using the BLIP-2 architecture.

Although hyperbolic embeddings offer potential insights into uncertainty not present in Euclidean embeddings, our analysis reveals that scaling these models is particularly difficult.

We propose a novel training strategy for a hyperbolic version of BLIP-2, which allows to achieve comparable performance to its Euclidean counterpart, while maintaining stability throughout the training process and showing a meaningful indication of uncertainty with each embedding.

;Comment: ECCV 2024 - Beyond Euclidean Workshop

Mandica, Paolo,Franco, Luca,Kallidromitis, Konstantinos,Petryk, Suzanne,Galasso, Fabio, 2024, Hyperbolic Learning with Multimodal Large Language Models