Emotion Recognition Through Analysis of Speech – A Review

Authors

  • Rasim Atakan Poyraz SRH University of Applied Sciences, Ernst-Reuter-Platz 10, 10587, Berlin, Germany
  • Prajyot Suvarna SRH University of Applied Sciences, Ernst-Reuter-Platz 10, 10587, Berlin, Germany
  • Alexander I. Iliev SRH University of Applied Sciences, Ernst-Reuter-Platz 10, 10587, Berlin, Germany; Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Acad. Georgi Bonchev Str., Block 8, 1113, Sofia, Bulgaria

DOI:

https://doi.org/10.55630/dipp.2024.14.21

Keywords:

Emotion Recognition, Decision Trees, Logistic Regression

Abstract

The feature extraction is very important for emotion recognition through speech. There are several approaches when dealing with emotion recognition. In this paper, we present different feature extraction approaches as well as different models used to differentiate between a neutral speech versus an emotional speech sample. This research is instrumental for the digitization and preservation of cultural heritage, as it allows us to capture and analyze the emotional nuances in historical audio recordings, ensuring their accurate representation for future generations. We have selected two works consisting of a total of four different methods for emotion recognition. In the first paper by Jacob (2017), we look at Decision tree and Logistic Regression. Decision tree attains an 84.45% accuracy on the test class whereas logistic regression is able to achieve an accuracy of 66.85% after stepwise regression. These methods contribute to the digital archiving of cultural heritage by providing robust tools for analyzing and preserving the emotional content of spoken artifacts. In another paper by Bhatti et all. (2004), sequential forward selection (SFS) was used to create subsets from the given features and relevance of the subsets of features. General regression neural network was used to evaluate the accuracy which was found to be 80.69%. As a complementary purpose, modular neural network was performed with an accuracy of 83.31% with the same dataset. These techniques enhance our ability to maintain the integrity and emotional depth of cultural heritage recordings in digital archives.

References

Bhatti, M. W., Wang, Y., & Guan, L. (2004). A neural network approach for human emotion recognition in speech. In 2004 IEEE International Symposium on Circuits and Systems (ISCAS), Vancouver, BC (pp. II-181). IEEE Xplore. http://dx.doi.org/10.1109/ISCAS.2004.1329238

Boersma, P. & Weenink, D. (n.d.). Praat: doing phonetics by computer. https://www.fon.hum.uva.nl/praat/

Ignatova, D. (2018). The effects of swimming on preschool children with spinal abnormalities. In R. Penkova (Ed.) 17th International BASOPED Conference "Traditions and innovations in the education of the Balkan countries" (pp. 207-212). Balkan Society for Pedagogy and Education.

Ignatova, D. (2021), Specificity of the motor potential for achieving Scholar Wellness. Trakia Journal of Sciences, 19 (Suppl. 1), 867-873. https://doi.org/10.15547/tjs.2021.s.01.136.

Ignatova, D. (2022). Study the influence of yoga specialised practices on the formation of correct body posture and corrections of spinal deformities. Smart Innovations in Recreative & Wellness Industry and Niche Tourism, 4 (1-2), 17-22. https://scjournal.globalwaterhealth.org/wp-content/uploads/2023/01/p.17-21_Ignatova_UK_V.4_Is.1-2_2022.pdf

Ignatova, D. (2023a). Implementation of motor complexes based on specialized application system blaze-pod trainer. Strategies for policy in science and education, 31(6), 653 - 667. https://doi.org/10.53656/str2023-6-6-imp

Ignatova, D. (2023b). Motor activity based on learning – contemporary trends in school wellness. Smart Innovations in Recreative & Wellness Industry and Niche Tourism, 5(1-2), 22-26. https://scjournal.globalwaterhealth.org/wp-content/uploads/2024/02/4.%E2%80%8CIGNATOVA__p.22-26-_V.5-Is.-1-2_2023.pdf

Ignatova, D. (2023c). Affirming wellness culture through innovative methodology related to Blaze-pod trainer system. Strategies for policy in science and education, 31(2), 212-225. https://doi.org/10.53656/str2023-2-7-aff

Jacob, A. (2017). Modelling speech emotion recognition using logistic regression and decision trees. International Journal of Speech Technology, 20 (4), 897–905. https://doi.org/10.1007/s10772-017-9457-6

Lausen A., & Schacht, A. (2018). Gender Differences in the Recognition of Vocal Emotions. Frontiers in Psychology, 9, Article 882. https://doi.org/10.3389/fpsyg.2018.00882

Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2 (6), 568-576. https://doi.org/10.1109/72.97934

Downloads

Published

2024-09-05

How to Cite

Atakan Poyraz, R., Suvarna, P., & I. Iliev, A. (2024). Emotion Recognition Through Analysis of Speech – A Review. Digital Presentation and Preservation of Cultural and Scientific Heritage, 14, 227–238. https://doi.org/10.55630/dipp.2024.14.21

Most read articles by the same author(s)