Document detail
ID

oai:arXiv.org:2402.17735

Topic
Electrical Engineering and Systems... Computer Science - Sound
Author
Churchwell, Cameron Morrison, Max Pardo, Bryan
Category

Computer Science

Year

2024

listing date

3/6/2024

Keywords
pronunciation speech
Metrics

Abstract

A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes).

PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion).

In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation.

We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation.

We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.

;Comment: Accepted to ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio

Churchwell, Cameron,Morrison, Max,Pardo, Bryan, 2024, High-Fidelity Neural Phonetic Posteriorgrams

Document

Open

Share

Source

Articles recommended by ES/IODE AI

Assessment of neuropathic pain management knowledge, attitudes, and practices among urology trainees and consultants in prostate cancer care: a survey-based study
neuropathic pain prostate cancer pain assessment who analgesic ladder dn4 questionnaire pain detect questionnaire pain management attitudes patient quality patients life pain prostate study cancer urologists management