Speech Emotion Recognition (SER) dengan Metode Bidirectional LSTM
Abstract
Emotions are a part of humans as a form of response to experienced events. Emotion analysis or known as speech emotion recognition (SER) is a field many researchers are interested in because voice recognition systems can assist in criminal investigations, monitoring, and detection of potentially dangerous events, and assisting the health care system. Therefore, this study proposes the detection of SER using the Bidirectional Long short-term memory (Bi-LSTM) model approach. The dataset used was scraped on the YouTube platform. The dataset is manually labeled then feature extraction is performed using the Mel Frequency Cepstral Coefficients (MFCC). The experiment using the Bi-LSTM method has an AUC ROC value of 0.97 and an f1-score value of 0.878. Based on these results, it can be concluded that the performance of the proposed method succeeded in predicting SER better than other comparison methods. This model also proved to be more precise in classifying human voices based on four types of emotions, namely happy, sad, angry, and neutral.