Détail du document
Identifiant

oai:arXiv.org:2405.20648

Sujet
Computer Science - Computer Vision... Computer Science - Computation and... Computer Science - Machine Learnin...
Auteur
Luo, Richard Peng, Austin Vasudev, Adithya Jain, Rishabh
Catégorie

Computer Science

Année

2024

Date de référencement

30/10/2024

Mots clés
shot captioning science information shotluck holmes efficient computer
Métrique

Résumé

Video is an increasingly prominent and information-dense medium, yet it poses substantial challenges for language models.

A typical video consists of a sequence of shorter segments, or shots, that collectively form a coherent narrative.

Each shot is analogous to a word in a sentence where multiple data streams of information (such as visual and auditory data) must be processed simultaneously.

Comprehension of the entire video requires not only understanding the visual-audio information of each shot but also requires that the model links the ideas between each shot to generate a larger, all-encompassing story.

Despite significant progress in the field, current works often overlook videos' more granular shot-by-shot semantic information.

In this project, we propose a family of efficient large language vision models (LLVMs) to boost video summarization and captioning called Shotluck Holmes.

By leveraging better pretraining and data collection strategies, we extend the abilities of existing small LLVMs from being able to understand a picture to being able to understand a sequence of frames.

Specifically, we show that Shotluck Holmes achieves better performance than state-of-the-art results on the Shot2Story video captioning and summary task with significantly smaller and more computationally efficient models.

Luo, Richard,Peng, Austin,Vasudev, Adithya,Jain, Rishabh, 2024, Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization

Document

Ouvrir

Partager

Source

Articles recommandés par ES/IODE IA

High-Frequency Repetitive Magnetic Stimulation at the Sacrum Alleviates Chronic Constipation in Parkinson’s Patients
magnetic stimulation parkinson’s significant patients scale sacrum pd hf-rms chronic constipation scores
The mechanism of PFK-1 in the occurrence and development of bladder cancer by regulating ZEB1 lactylation
bladder cancer pfk-1 zeb1 lactylation glycolysis inhibits lactate glucose bc pfk-1 cancer lactylation cells bladder