Predictive Model to Analyse Real and Synthetic Data for Learners' Performance Prediction Using Regression Techniques
DOI:
https://doi.org/10.24059/olj.v29i1.4390Keywords:
Learners Performance Prediction, Educational Data Analytics, Predictive Models, Privacy Preservation, Synthetic Data Generation, Regression AnalysisAbstract
Predicting learner performance with precision is critical within educational systems, offering a basis for tailored interventions and instruction. The advent of big data analytics presents an opportunity to employ Machine Learning (ML) techniques to this end. Real-world data availability is often hampered by privacy concerns, prompting a shift towards synthetic data generation. This study presents an empirical comparison of real, synthetic, and mixed (real + synthetic) data sets in forecasting learner performance, deploying an array of regression-based ML algorithms, including Random Forest, Gradient Boosting, XG Boost, K-nearest Neighbor, and Support Vector Regression. Our methodology encompasses the generation of synthetic data via generative model, followed by the application of these algorithms to each data set. The models are evaluated using precision metrics to assess their predictive accuracy. The study unveils that synthetic data can rival real data in predictive capabilities, with combined data sets achieving up to 87.76% accuracy, underscoring the efficacy of hybrid data approaches. These insights advocate for the integration of synthetic data as a practical substitute in scenarios with limited access to real data, fostering advancements in educational technology and ML.
References
Alalawi, K., Athauda, R. and Chiong, R., 2023. Contextualizing the current state of research on the use of machine learning for student performance prediction: A systematic literature review. Engineering Reports, p.e12699.
Alyahyan, E. and Düştegör, D., 2020. Predicting academic success in higher education: literature review and best practices. International Journal of Educational Technology in Higher Education, 17, pp.1-21.
Bethencourt-Aguilar, A., Castellanos-Nieves, D., Sosa-Alonso, J.J. and Area-Moreira, M., 2023. Use of Generative Adversarial Networks (GANs) in Educational Technology Research.
Bujang, S.D.A., Selamat, A., Ibrahim, R., Krejcar, O., Herrera-Viedma, E., Fujita, H. and Ghani, N.A.M., 2021. Multiclass prediction model for student grade prediction using machine learning. IEEE Access, 9, pp.95608-95621.
Flanagan, B., Majumdar, R. and Ogata, H., 2022. Fine grain synthetic educational data: challenges and limitations of collaborative learning analytics. IEEE Access, 10, pp.26230-26241.
Garcia, M., Smith, J., and Lee, H. (2022). Enhancing educational data privacy through generative adversarial networks. Journal of Educational Technology, 26(3), 123-136.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y., 2020. Generative adversarial networks. Communications of the ACM, 63(11), pp.139-144.
Moreno-Marcos, P. M., Pong, T. C., Munoz-Merino, P. J., and Kloos, C. D. (2020). Analysis of the factors influencing learners’ performance prediction of multisource, multifeature behavioral data with learning analytics. IEEE Access, 8, 5264–5282.
Murray, L.L. and Wilson, J.G., 2021. Generating data sets for teaching the importance of regression analysis. Decision Sciences Journal of Innovative Education, 19(2), pp.157-166.
Yagci, M. (2022). Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learning Environments, 9(1), 11. https://doi.org/10.1186/s40561-022-00199-2.
Sarwat, S., Ullah, N., Sadiq, S., Saleem, R., Umer, M., Eshmawi, A.A., Mohamed, A. and Ashraf, I., 2022. Predicting Students’ Academic Performance with Conditional Generative Adversarial Network and Deep SVM. Sensors, 22(13), p.4834.
Shabnam Ara, S. J., and Tanuja, R. (2023). Investigating the influential factors of learner performance in online education using a learning analytics approach. In 2023 3rd International Conference on Intelligent Technologies (CONIT), pp. 1-11. https://doi.org/10.1109/CONIT59222.2023.10205849
Shabnam Ara, S. J., & Tanuja, R. (2024). Exploring key parameters influencing student performance in a blended learning environment using learning analytics. Journal of Education and e-Learning Research, 11(1), 77-89. https://doi.org/10.20448/jeelr.v11i1.5330
Tomasevic, N., Gvozdenovic, N. and Vranes, S., 2020. An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & education, 143, p.103676.
Wang, L., Chen, W., Yang, W., Bi, F. and Yu, F.R., 2020. A state-of-the-art review on image synthesis with generative adversarial networks. IEEE Access, 8, pp.63514-63537.
Zhao, L., Chen, K., Song, J., Zhu, X., Sun, J., Caulfield, B. and Mac Namee, B., 2020. Academic performance prediction based on multisource, multifeature behavioral data. IEEE Access, 9, pp.5453-5465.
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D. and Xiao, X., 2017. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4), pp.1-41.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 SHABNAM ARA S.J, Tanuja R, Manjula S.H

This work is licensed under a Creative Commons Attribution 4.0 International License.
As a condition of publication, the author agrees to apply the Creative Commons – Attribution International 4.0 (CC-BY) License to OLJ articles. See: https://creativecommons.org/licenses/by/4.0/.
This licence allows anyone to reproduce OLJ articles at no cost and without further permission as long as they attribute the author and the journal. This permission includes printing, sharing and other forms of distribution.
Author(s) hold copyright in their work, and retain publishing rights without restrictions

