Dokumentdetails
ID

oai:arXiv.org:2405.13951

Thema
Computer Science - Computer Vision...
Autor
Kothandaraman, Divya Sohn, Kihyuk Villegas, Ruben Voigtlaender, Paul Manocha, Dinesh Babaeizadeh, Mohammad
Kategorie

Computer Science

Jahr

2024

Auflistungsdatum

29.05.2024

Schlüsselwörter
concepts multi-concept video
Metrisch

Zusammenfassung

We present a method for multi-concept customization of pretrained text-to-video (T2V) models.

Intuitively, the multi-concept customized video can be derived from the (non-linear) intersection of the video manifolds of the individual concepts, which is not straightforward to find.

We hypothesize that sequential and controlled walking towards the intersection of the video manifolds, directed by text prompting, leads to the solution.

To do so, we generate the various concepts and their corresponding interactions, sequentially, in an autoregressive manner.

Our method can generate videos of multiple custom concepts (subjects, action and background) such as a teddy bear running towards a brown teapot, a dog playing violin and a teddy bear swimming in the ocean.

We quantitatively evaluate our method using videoCLIP and DINO scores, in addition to human evaluation.

Videos for results presented in this paper can be found at https://github.com/divyakraman/MultiConceptVideo2024.

;Comment: Paper accepted to AI4CC Workshop at CVPR 2024

Kothandaraman, Divya,Sohn, Kihyuk,Villegas, Ruben,Voigtlaender, Paul,Manocha, Dinesh,Babaeizadeh, Mohammad, 2024, Text Prompting for Multi-Concept Video Customization by Autoregressive Generation

Dokumentieren

Öffnen

Teilen

Quelle

Artikel empfohlen von ES/IODE AI

Lung cancer risk and exposure to air pollution: a multicenter North China case–control study involving 14604 subjects
lung cancer case–control air pollution never-smokers nomogram model controls lung-related 14604 subjects north polluted consistent smokers quit exposure lung cancer risk air people factor smoking pollution study history