Quantile-based e-Learning Student Engagement Classification
DOI:
https://doi.org/10.34190/ejel.24.3.4678Keywords:
Learning analytics, Student engagement, Quantile classification, Cross-dataset validation, Random forest, Educational data miningAbstract
Classifying student engagement accurately is critical for timely academic intervention; however, most existing approaches rely on arbitrarily defined thresholds that lack statistical grounding and are difficult to transfer across institutional contexts. This limitation reduces the practical applicability of engagement analytics in diverse educational settings. This study evaluates a quantile-based engagement classification framework across two contrasting datasets to assess its validity, transferability, and consistency of predictive features. Unlike threshold-based approaches, the proposed framework derives engagement categories directly from dataset-specific interaction distributions. The Open University Learning Analytics Dataset (OULAD) represents large-scale fully online learning, while the Unistudium dataset reflects a smaller blended learning context. The two datasets differ substantially in size and delivery mode, with a student ratio of approximately 17.4 to 1. This contrast provides a rigorous basis for assessing method transferability. Engagement categories (passive, moderate, and active) are derived using dataset-specific quartile thresholds (Q1 and Q3). This strategy adapts automatically to local interaction distributions and avoids manual parameter tuning. Five temporal behavioural features were extracted, including active days, unique actions, and learning consistency. Random Forest was employed as the proposed model, while a Decision Tree classifier was included as a baseline for comparative evaluation. The results indicate that the proposed framework remains effective across different educational contexts. In the OULAD dataset, the model achieved an accuracy of 92.04% with a Cohen’s κ of 0.87. In the Unistudium dataset, accuracy reached 72.50% with a Cohen’s κ of 0.59. Although performance differed between datasets, variance remained low. Feature importance analysis further revealed strong consistency across contexts, with a Spearman correlation of 0.90. Active days and unique actions were the most influential predictors in both cases. The baseline comparison further confirmed the superiority of Random Forest over the Decision Tree baseline across both datasets. These findings support e-learning practice by offering institutions a statistically grounded and automated method for engagement classification. The approach removes the need for arbitrary thresholds and reduces operational overhead in analytics deployment. From a research perspective, the study establishes realistic performance benchmarks for engagement analytics at different institutional scales, demonstrates the applicability of quantile-based engagement classification across heterogeneous datasets, and confirms that key behavioural engagement indicators transfer reliably across online and blended learning environments.
Downloads
Published
License
Copyright (c) 2026 Aditya Galih Sulaksono, Syaad Patmanthara, Harits Ar Rosyid

This work is licensed under a Creative Commons Attribution 4.0 International License.
Open Access Publishing
The Electronic Journal of e-Learning operates an Open Access Policy. This means that users can read, download, copy, distribute, print, search, or link to the full texts of articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, is that authors control the integrity of their work, which should be properly acknowledged and cited.