A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

Documentdetail

ID kaart

oai:arXiv.org:2408.09491

Onderwerp

Computer Science - Sound Electrical Engineering and Systems...

Auteur

Li, Yangze Wang, Xiong Cao, Songjun Zhang, Yike Ma, Long Xie, Lei

Categorie

Computer Science

Jaar

2024

vermelding datum

21-08-2024

Trefwoorden

decoding audio-llm speech audio

Metriek

Beschrijving

Audio-LLM introduces audio modality into a large language model (LLM) to enable a powerful LLM to recognize, understand, and generate audio.

However, during speech recognition in noisy environments, we observed the presence of illusions and repetition issues in audio-LLM, leading to substitution and insertion errors.

This paper proposes a transcription prompt-based audio-LLM by introducing an ASR expert as a transcription tokenizer and a hybrid Autoregressive (AR) Non-autoregressive (NAR) decoding approach to solve the above problems.

Experiments on 10k-hour WenetSpeech Mandarin corpus show that our approach decreases 12.2% and 9.6% CER relatively on Test_Net and Test_Meeting evaluation sets compared with baseline.

Notably, we reduce the decoding repetition rate on the evaluation set to zero, showing that the decoding repetition problem has been solved fundamentally.

Li, Yangze,Wang, Xiong,Cao, Songjun,Zhang, Yike,Ma, Long,Xie, Lei, 2024, A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

Document

Openen

Bron

Artikelen aanbevolen door ES/IODE AI

Computer Science

Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning?

peer author-provided analysis rankings scores review learning machine science computer

BMJ Neurology Open

Batoclimab as induction and maintenance therapy in patients with myasthenia gravis: rationale and study design of a phase 3 clinical trial

gravis myasthenia study clinical phase baseline improvement mg-adl 340 week trial placebo period mg maintenance qw

American Journal of Cancer R...

NOLC1 was identified as a tumor suppressor gene in thyroid cancer and correlated with prognosis by bioinformatics

cancer patients thca tumor nolc1