Predicting Students’ Academic Performance in Virtual Learning Environment Using Pearson Correlation Coefficient

Authors

  • F.O Adelodun Department of Computer Science, Lead City University, Ibadan, Nigeria
  • W Sakpere Department of Computer Science, Lead City University, Ibadan, Nigeria
  • K. F Famurewa Department of Computer Science, Lead City University, Ibadan, Nigeria
  • Y.J Oguns Computer Studies Department, The Polytechnic Ibadan, Nigeria

Keywords:

Feature Selection, Students’ Academic Performance, VLE, Pearson Correlation Coefficient

Abstract

Feature Selection involves selecting the most relevant features from a dataset during the prediction process. The

selection method of features greatly influences how accurate, understandable, and effective predictive models

are. Predicting students' academic success or struggle in a Virtual Learning Environment (VLE) is limited.

Students who drop out of online courses are substantially more numerous than those who drop out of traditional

courses [1,2]. The methodology followed in the study involved the use of two approaches: training and testing

machine learning models with features selected from the dataset, and the second approach involved training and

testing the machine models using all features in the dataset without feature selection. The Pearson Correlation

Coefficient (PCC) feature selection method is used to select the features used for prediction. The two

approaches were compared in terms of their impacts on the performance of the machine learning algorithms.

The study was carried using nine classification models, which include Logistic Regression, K-Nearest

Neighbour (KNN), Support Vector Machine (SVM), Random Forest, Gradient Boosting, XGBoosting,

LightGBM, MLP classifier (Neural Network) and Naïve Bayes. The result of the study showed that logistic

Regression show highest accuracy mean of 0.7333 with feature selection and reduced accuracy mean of 0.7188

when all features were used in the prediction process. Without feature selection, the accuracy mean of Random

Forest is 0.6813 and applying PCC feature selection to select the features for prediction, the accuracy mean of

Random Forrest increased to 0.7333 revealing that feature selection method such as PCC is important for

improving model performance.

Downloads

Published

2025-03-07