Document detail
ID

oai:arXiv.org:2407.03648

Topic
Electrical Engineering and Systems... Computer Science - Sound
Author
Lan, Gael Le Shi, Bowen Ni, Zhaoheng Srinivasan, Sidd Kumar, Anurag Ellis, Brian Kant, David Nagaraja, Varun Chang, Ernie Hsu, Wei-Ning Shi, Yangyang Chandra, Vikas
Category

Computer Science

Year

2024

listing date

10/23/2024

Keywords
inversion latent music
Metrics

Abstract

We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model.

It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec.

Based on a diffusion transformer architecture trained on a flow-matching objective the model can edit diverse high quality stereo samples of variable duration, with simple text descriptions.

We adapt the ReNoise latent inversion method to flow matching and compare it with the original implementation and naive denoising diffusion implicit model (DDIM) inversion on a variety of music editing prompts.

Our results indicate that our latent inversion outperforms both ReNoise and DDIM for zero-shot test-time text-guided editing on several objective metrics.

Subjective evaluations exhibit a substantial improvement over previous state of the art for music editing.

Code and model weights will be publicly made available.

Samples are available at https://melodyflow.github.io.

Lan, Gael Le,Shi, Bowen,Ni, Zhaoheng,Srinivasan, Sidd,Kumar, Anurag,Ellis, Brian,Kant, David,Nagaraja, Varun,Chang, Ernie,Hsu, Wei-Ning,Shi, Yangyang,Chandra, Vikas, 2024, High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching

Document

Open

Share

Source

Articles recommended by ES/IODE AI

Diabetes and obesity: the role of stress in the development of cancer
stress diabetes mellitus obesity cancer non-communicable chronic disease stress diabetes obesity patients cause cancer