Emotional speech generation by Text to Speech and Voice Conversion

Abstract

With recent advents in deep learning applications, generating a realistic and nature speech utterance is getting the spotlight. Among them, generating emotional speech is an essential part to enhance the diversity of the synthesized utterance and it can be applied to many fields such as firm industry, AI avatars, humanoids, and any other human-machine interactive applications. This investigation aims to develop emotional Text-To-Speech (TTS) and emotional Voice Conversion (VC). The topic of emotional TTS is focused on generating an emotional utterance from text input and the topic of emotional VC is aimed to develop speech-to-speech emotional style transformation. In particular, the goal of this investigation is to develop emotion-injective TTS and VC models for unseen emotion. To this end, this investigation first develops a realistic multi-emotional dataset by emotional VC, and their performance demonstrated by emotional TTS. Then, by leveraging the dataset, TTS and VC models which directly capture emotional traits from an input utterance are developed to tackle unseen emotional speech generation.

Document Details

Document Type: DoD Grant Award
Publication Date: Feb 16, 2024
Source ID: FA23862314098

Entities

People

Hanseok Ko

Organizations

Air Force Office of Scientific Research
Korea University
United States Air Force

Emotional speech generation by Text to Speech and Voice Conversion

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas