Identifying At-Risk Online Learners by Psychological Variables Using Machine Learning Techniques

Hsiang-yu Chien, Oi-Man Kwok, Yu-Chen Yeh, Noelle Wall Sweany, Eunkyeng Baek, William Alex McIntosh


The purpose of this study was to investigate a predictive model of online learners’ learning outcomes through machine learning. To create a model, we observed students’ motivation, learning tendencies, online learning-motivated attention, and supportive learning behaviors along with final test scores. A total of 225 college students who were taking online courses participated. Longitudinal data were collected over three semesters (T1, T2, and T3). T3 was used as training data given that it contained the largest sample size across all three data waves. To analyze the data, two approaches were applied: (a) stepwise logistic regression and (b) random forest (RF). Results showed that RF used fewer items and predicted final grades more accurately in a small sample. Furthermore, it selected four items that might potentially be used to identify at-risk learners even before they enroll in an online course.


machine learning, random forest, online learning, at-risk online learners, stepwise regression, logistic regression

Full Text:



Allen, I. E., & Seaman, J. (2013). Changing course: Ten years of tracking online education in the United States. Babson Park, MA: Babson Survey Research Group and Quahog Research Group, LLC.

Barnard, L., Paton, V., & Lan, W. (2008). Online self-regulatory learning behaviors as a mediator in the relationship between online course perceptions with achievement. The International Review of Research in Open and Distance Learning, 9(2), 1-11.

Bonny, A. E., Britto, M. T., Klostermann, B. K., Hornung, R. W., & Slap, G. B. (2000). School disconnectedness: Identifying adolescents at risk. Pediatrics, 106(5), 1017-1021.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Brophy, J. E. (1983). Research on the self-fulfilling prophecy and teacher expectations. Journal of Educational Psychology, 75, 631-661.

Cooper, H. M. (2000). Pygmalion grows up. In P. K. Smith, & A. D.

Pellegrini (Eds.), Psychology of education: Major themes (pp. 338-364). London, UK: Routledgefalmer.

DeCandia, C. (2019). Managing distractions as an online student. Retrieved from

Digital Learning Compass. (2017). Digital Learning Compass:

Distance education enrollment report 2017. Retrieved from

Du, J. (2016). Predictors for Chinese students’ management of study environment in online groupwork. International Journal of Experimental Educational Psychology, 36(9), 1614-1630.

Elliot, A. J., & McGregor, H. A. (2001). A 2 × 2 achievement goal framework. Journal of Personality and Social Psychology, 80, 501-519.

Er, E. (2012). Identifying at-risk students using machine learning techniques: A case study with IS 100. International Journal of Machine Learning and Computing, 2(4), 476.

Fassnacht, F. E., Hartig, F., Latifi, H., Berger, C., Hernández, J., Corvalán, P., & Koch, B. (2014). Importance of sample size, data type and prediction method for remote sensing-based estimations of aboveground forest biomass. Remote Sensing of Environment, 154, 102-114.

Foresman, B. (2020, March 24). Here are the U.S. universities that have closed due to coronavirus. Retrieved from

Funk, J. T. (2005). At-risk online learners: Reducing barriers to success. eLearn, 2005(8), 3.

Gorbunovs, A., Kapenieks, A., & Cakula, S. (2016). Self-discipline as a key indicator to improve learning outcomes in e-learning environment. Procedia - Social and Behavioral Sciences, 231, 256-262.

Jansen, R. S., Van Leeuwen, A., Janssen, J., Kester, L., & Kalz, M. (2017). Validation of the self-regulated online learning questionnaire. Journal of Computing in Higher Education, 29(1), 6-27.

Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., & Baron, J. D. (2014). Early alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6-47.

Kerr, M. S., Rynearson, K., & Kerr, M. C. (2006). Student characteristics for online learning success. Internet and Higher Education, 9, 91-105.

Keshtkar, F., Cowart, J., & Crutcher, A. (2018). Predicting risk of failure in online learning platforms using machine learning algorithms for modeling students’ academic performance. pdf

Kleinman, R. E., Murphy, J. M., Little, M., Pagano, M., Wehler, C. A., Regal, K., & Jellinek, M. S. (1998). Hunger in children in the United States: Potential behavioral and emotional correlates. Pediatrics, 101(1), 100-111.

Koball, A. M., Meers, M. R., Storfer-Isser, A., Domoff, S. E., & Musher-Eizenman, D. R. (2012). Eating when bored: Revision of the Emotional Eating Scale with a focus on boredom. Health Psychology, 31(4), 521.

Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. Boca Raton, FL: CRC Press.

Kwok, O. M., Yeh, Y. C., Chien, H. Y., Sweany, N. W., Baek, E., & McIntosh, W. (2019). Finding the at-risk online learners: Development of the Online REadiness Screener (ORES). In the Companion Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK19) (pp. 159-160), Tempe, AZ. New York, NY: ACM.

Lakkaraju, H., Aguiar, E., Shan, C., Miller, D., Bhanpuri, N., Ghani, R., & Addison, K. L. (2015, August). A machine learning framework to identify students at risk of adverse academic outcomes. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1909-1918), Sydney, NSW, Australia. New York, NY: ACM.

Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News 2(3), 18-22.

Lin, B., & Hsieh, C. T. (2001). Web-based teaching and learner control: A research review. Computers & Education, 37, 377-386.

Mahboob, T., Irfan, S., & Karamat, A. (2016). A machine learning approach for student assessment in E-learning using Quinlan’s C4.5, Naive Bayes and Random Forest algorithms. Presentation at the 19th International Multi-Topic Conference (INMIC), Islamabad, Pakistan.

Menard, S. (2002). Applied logistic regression analysis (Vol. 106). Thousand Oaks, CA: Sage.

Monhardt, B. M. (1995). Safe by definition. American School Board Journal, 182(2), 32-34.

Moody, J. (2004). Distance education: Why are the attrition rates so high? The Quarterly Review of Distance Education, 5(3), 205-210.

Mundry, R., & Nunn, C. (2009). Stepwise model fitting and statistical inference: Turning noise into signal pollution. The American Naturalist, 173(1), 119-123.

Murphy, J. M., Wehler, C. A., Pagano, M. E., Little, M., Kleinman, R. E., & Jellinek, M. S. (1998). The relationship between hunger and psychosocial functioning in low income American children. Journal of the American Academy of Child & Adolescent Psychiatry, 37(2), 163-170.

No Kid Hungry’s Center for Best Practices. (2019). Learn how hunger affects your school. Retrieved from

Pintrich, P. R., Smith, D. A. F., Garcia, T., & McKeachie, W. J. (1991). A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). Ann Arbor: University of Michigan, National Center for Research to Improve Postsecondary Teaching and Learning.

Poulet, C., Veale, D., Arnol, N., Levy, P., Pepin, J. L., & Tyrrell, J. (2009). Psychological variables as predictors of adherence to treatment by continuous positive airway pressure. Sleep Medicine, 10(9), 993-999.

R Core Team. (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Rappel, L. (2017). Self-direction in on-line learning. Journal of Educational Systems, 1(1), 6-14.

RStudio Team. (2015). RStudio: Integrated development for R. Boston, MA: RStudio, Inc.

Rubie-Davies, C. M. (2010). Teacher expectations and perceptions of student attributes: Is there a relationship? British Journal of Educational Psychology, 80, 121-135.

Steyerberg, E., Eijkemans, M., & Habbema, D. (1999). Stepwise selection in small data sets: A simulation study of bias in logistic regression analysis. Journal of Clinical Epidemiology, 52(10), 935-942.

Taormino, M. (2010). Student preparation for distance education. Distance Learning, 7(3), 55.

Whittingham, M., Stephens, P., Bradbury, R., & Freckleton, R. (2006). Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75(5), 1182-1189.

Wu, J. Y. (2017). The indirect relationship of media multitasking self-efficacy on learning performance within the personal learning environment: Implications from the mechanism of perceived attention problems and self-regulation strategies. Computers & Education, 106, 56-72.

Yeh, Y. C., Kwok, O. M., Chien, H. Y., Sweany, N. W., Baek, E., & McIntosh, W. A. (2019). How college students’ achievement goal orientations predict their expected online learning outcome: The mediation roles of self-regulated learning strategies and supportive online learning behaviors. Online Learning, 23(4), 23-41.


Copyright (c) 2020 Oi-Man Kwok, Yu-Chen Yeh, Noelle Wall Sweany, Eunkyeng Baek, William Alex McIntosh

License URL: