Cross-Cultural Emotion Recognition and Comparison Using Convolutional Neural Networks
Keywords:Emotion Recognition, Speech Analysis, Language Processing, Convolutional Neural Networks, Cultural Comparison
AbstractThe paper sets to define a comparison of emotions across 3 different cultures namely Canadian French, Italian, and North American. This was achieved using speech samples for each of the three languages subject to our study. The features used were MFCCs and were passed through convolutional neural network in order to verify their significance for the task of emotion recognition through speech. Three different systems were trained and tested, one for each language. The accuracy came to 71.10%, 79.07%, and 73.89% for each of them respectively. The aim was to prove that the feature vectors we used were representing each emotion well. A comparison across each emotion, gender and language was drawn at the end and it was observed that apart from the emotion neutral , every other emotion was expressed somewhat differently by each culture. Speech is one of the main vehicles to recognize emotions and is an attractive area to be studied with application to presenting and preserving different cultural and scientific heritage.
Kwonl, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion Recognition by Speech Signals. 8th European Conference on Speech Communication and Technology (pp. 125-128). Geneva: EUROSPEECH 2003 - INTERSPEECH 2003.
Owens, A. (2008). A Case study of cross cultural communication issues for Filipino call centre staff and their Australian customers. IEEE International Professional Communication Conference (pp. 1-10). Montreal: IEEE.
Iliev, A. I., & Stanchev, P. (2017). Smart Function Digital Content Ecosystem using Emotion Analysis of Voice. International Conference on Computer Systems and Technologies.
Livingstone, S. R., & Russo, F. A. (2018, May 16). Plos One. Retrieved from NCBI: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5955500/
Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). Emovo Corpus: an Italian Emotional Speech Database. International Conference on Language Resources and Evaluation , (pp. 3501-3504).
Cowen, A. S., & Keltner, D. (Sept, 05, 2017). Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proceedings for the National Academy of Sciences of the United States of America.
Iliev, A. I., & Scordilis, M. S. (2008). Emotion Recognition in Speech using Inter- Sentence Glottal Statistics. Proceedings of the 15th International Conference on systems, Signals and Image Processing (pp. 465-468). Bratislava, Slovakia: IEEE- IWSSIP.
Iliev, A. I., & Scordilis, M. S. (2011). Spoken Emotion Recognition Using Glottal Symmetry. EURASIP Journal on Advances in Signal Processing. , p. 11. Hindawi Publishing Corporation.
Iliev, A. I., & Stanchev, P. L. (2018). Glottal Attributes Extracted from Speech with Application to Emotion Driven Smart Systems. Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (pp. 297-302). KDIR.
Iliev, A. I. (2012). Emotion Recognition From Speech. Lambert Academic Publishing.
Gournay, P., Lahaie, O., & Lefebvre, R. (2018). A canadian french emotional speech dataset. 9th ACM Multimedia Systems Conference (pp. 399-402). Amsterdam: Association of Computing Machinery.
Dyk, D. A., & Meng, X.-L. (2001). The Art of Data Augmentation. Journal of Computational and Graphical Statistics (pp. 1-50). Journal of Computational and Graphical Statistics.
Chu, R. (2019, June 1). Speech Emotion Recognition with Convolutional Neural Network . Retrieved from towardsdatascience: https://towardsdatascience.com/speech-emotion-recognition-with-convolutionneural-network-1e6bb7130ce3.
Sechidis, K., Tsoumakas, G., & Vlahavas, I. (2011). On the Stratification of Multilabel Data. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 145-158). Lecture Notes in Computer Science, vol 6913.