Improved Speech Emotion Recognition Using Boosting Ensemble of Class Specific Classifiers
Keywords:
Speech emotion recognition, Ensemble classifier, Feature extraction,, CatBoost Classifier, SVM Classifier, Boosting EnsembleAbstract
Abstract
Speech emotion recognition (SER) systems enable machines to understand human emotions from speech. Due to its rising popularity, it is also interesting to scholars everywhere. Researchers have explored many different techniques and methodologies to detect emotions from speech. Different machine learning approaches have been investigated since machine learning (ML) has emerged as one of the most promising methodologies. However, knowing which feature combination gives a better result has been one major challenge of speech recognition systems, as the features selected affect the classification result. Therefore, a general Random forest-based emotion detection system has been built in this study effort to categorize emotions based on the provided features. The features are selected based on the class-specific best feature for each emotion class and are identified based on the f1 measure. The selected characteristics are then combined, applying the contemporary smart ensemble approaches. Mfcc, Chroma, and Melspectogram are the features under study, and Mfcc appeared to be the best of the three case studies with an accuracy of 63% higher than other features. The three features were then Ensembled using a CatBoost algorithm which gave an accuracy of 71.62%. The result indicates that Mfcc performs better than other features listed in the study. Also, the ensemble classification result performed superior to the classifiers mentioned in the literature.