Fostering positive team behaviors in human-machine teams through emotion processing: Adapting to the operator's state
Abstract
The team developed a software system that simultaneously recognizes 7 emotional categories as speech is produced. It is suitable for applications on cellular phones and online speech communication platforms. The methodology uses deep learning (DL) with speech signals being represented in the form of RGB images of speech spectrograms. By representing speech signals in the form of RGB images, the speech classification problem was re-defined as an image classification task. This created an opportunity to replace the lengthy and data-costly training of a deep neural network by the shortened and more data-efficient fine tuning of an existing pre-trained image classification network (AlexNet). The speech emotion recognition (SER) results achieved with the fine-tuned AlexNet (FTAlexNet) showed an average accuracy of 80 for the Berlin Emotional Speech data. This result was found to be comparable with existing state-of-the art techniques, but with the advantage of significantly lower computational and data costs.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 17, 2018
- Accession Number
- AD1058296
Entities
People
- Margaret Lech
Organizations
- RMIT University