Emotional speech generation by Text to Speech and Voice Conversion
Abstract
With recent advents in deep learning applications, generating a realistic and nature speech utterance is getting the spotlight. Among them, generating emotional speech is an essential part to enhance the diversity of the synthesized utterance and it can be applied to many fields such as firm industry, AI avatars, humanoids, and any other human-machine interactive applications. This investigation aims to develop emotional Text-To-Speech (TTS) and emotional Voice Conversion (VC). The topic of emotional TTS is focused on generating an emotional utterance from text input and the topic of emotional VC is aimed to develop speech-to-speech emotional style transformation. In particular, the goal of this investigation is to develop emotion-injective TTS and VC models for unseen emotion. To this end, this investigation first develops a realistic multi-emotional dataset by emotional VC, and their performance demonstrated by emotional TTS. Then, by leveraging the dataset, TTS and VC models which directly capture emotional traits from an input utterance are developed to tackle unseen emotional speech generation.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 16, 2024
- Source ID
- FA23862314098
Entities
People
- Hanseok Ko
Organizations
- Air Force Office of Scientific Research
- Korea University
- United States Air Force