AI-Based Analysis of Student Frustration: Speech and Facial Expression Recognition

Authors

  • Akniyet Tokhtarov Department of Pedagogy and Psychology, I. Zhansugurov Zhetysu University, Taldykorgan, Kazakhstan https://orcid.org/0009-0008-4675-2204
  • Nurgul Toxanbayeva Department Of General and Implied Psychology, Al-Farabi Kazakh National University, Almaty, Kazakhstan https://orcid.org/0009-0005-6495-1475
  • Meirambala Seisekenova Department Of Economics and Service, I. Zhansugurov Zhetysu University, Taldykorgan, Kazakhstan https://orcid.org/0009-0001-9633-3082
  • Talgat Baidildinov Department of Special Pedagogy, Abay Kazakh National Pedagogical University, Almaty, Kazakhstan https://orcid.org/0009-0003-2357-8542
  • Dametken Baigozhanova Higher School of Information Technology and Engineering, Astana International University, Astana, Kazakhstan https://orcid.org/0009-0001-9310-3118
  • Asylbek Abden Department of Physical Culture and Primary Military Training, I. Zhansugurov Zhetysu University, Taldykorgan, Kazakhstan https://orcid.org/0009-0001-3225-2435

DOI:

https://doi.org/10.34190/ejel.23.2.4043

Keywords:

Frustration detection, Emotion recognition, Multimodal learning, Facial analysis, Speech emotion recognition, AI in education

Abstract

Frustration is a key affective state that affects student engagement and learning outcomes. While mild frustration can promote persistence in problem-solving, prolonged frustration often leads to disengagement and reduced academic performance. In traditional learning environments, instructors rely on facial expressions, vocal cues, and behavioral indicators to identify frustration and provide timely support. Such monitoring becomes impractical in large or digital classrooms. Artificial intelligence (AI)-based emotion recognition offers a scalable solution by automatically detecting frustration through facial and speech analysis, enabling adaptive interventions in real time. This study proposes a multimodal AI system that integrates facial expression recognition using a Convolutional Neural Network (CNN) and speech emotion recognition with a Transformer-based model. The system uses attention-based feature fusion to improve accuracy by weighting the more informative modalities. The model was trained on benchmark datasets, including DAiSEE, IEMOCAP, and RAVDESS, and evaluated in a real-world study involving 160 Kazakhstani university students in online and in-person learning sessions. AI-generated predictions were compared with instructor assessments to validate the system’s performance. Results indicate that the multimodal system outperforms unimodal approaches, achieving 85% accuracy, 83% precision, and 86% recall on benchmark data, with 84% accuracy and precision in real-world conditions. Comparative analysis reveals that speech-based cues are more informative than facial expressions, particularly when frustration is masked or internalized. The system is less effective at detecting subtle frustration, highlighting the need for greater contextual sensitivity. Although limitations remain, the results demonstrate the system's potential for scalable implementation in classrooms and online platforms. These findings support the integration of AI-driven frustration detection into adaptive learning platforms to help educators identify students at risk of disengagement. By enabling timely intervention and support, such tools can contribute to more responsive and inclusive educational environments. Future research should explore cultural variation in emotional expression and long-term effects on learning outcomes.

Downloads

Published

23 Jun 2025

Issue

Section

Articles

Categories