Emotional speech generation by Text to Speech and Voice Conversion

Abstract

With recent advents in deep learning applications, generating a realistic and nature speech utterance is getting the spotlight. Among them, generating emotional speech is an essential part to enhance the diversity of the synthesized utterance and it can be applied to many fields such as firm industry, AI avatars, humanoids, and any other human-machine interactive applications. This investigation aims to develop emotional Text-To-Speech (TTS) and emotional Voice Conversion (VC). The topic of emotional TTS is focused on generating an emotional utterance from text input and the topic of emotional VC is aimed to develop speech-to-speech emotional style transformation. In particular, the goal of this investigation is to develop emotion-injective TTS and VC models for unseen emotion. To this end, this investigation first develops a realistic multi-emotional dataset by emotional VC, and their performance demonstrated by emotional TTS. Then, by leveraging the dataset, TTS and VC models which directly capture emotional traits from an input utterance are developed to tackle unseen emotional speech generation.

Document Details

Document Type
DoD Grant Award
Publication Date
Feb 16, 2024
Source ID
FA23862314098

Entities

People

  • Hanseok Ko

Organizations

  • Air Force Office of Scientific Research
  • Korea University
  • United States Air Force

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Psychological Intervention/Treatment for Stress, Anxiety, PTSD, and Related Emotional and Cognitive Health Symptoms.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks