Predicting Students’ Academic Performance in Virtual Learning Environment Using Pearson Correlation Coefficient
Keywords:
Feature Selection, Students’ Academic Performance, VLE, Pearson Correlation CoefficientAbstract
Feature Selection involves selecting the most relevant features from a dataset during the prediction process. The
selection method of features greatly influences how accurate, understandable, and effective predictive models
are. Predicting students' academic success or struggle in a Virtual Learning Environment (VLE) is limited.
Students who drop out of online courses are substantially more numerous than those who drop out of traditional
courses [1,2]. The methodology followed in the study involved the use of two approaches: training and testing
machine learning models with features selected from the dataset, and the second approach involved training and
testing the machine models using all features in the dataset without feature selection. The Pearson Correlation
Coefficient (PCC) feature selection method is used to select the features used for prediction. The two
approaches were compared in terms of their impacts on the performance of the machine learning algorithms.
The study was carried using nine classification models, which include Logistic Regression, K-Nearest
Neighbour (KNN), Support Vector Machine (SVM), Random Forest, Gradient Boosting, XGBoosting,
LightGBM, MLP classifier (Neural Network) and Naïve Bayes. The result of the study showed that logistic
Regression show highest accuracy mean of 0.7333 with feature selection and reduced accuracy mean of 0.7188
when all features were used in the prediction process. Without feature selection, the accuracy mean of Random
Forest is 0.6813 and applying PCC feature selection to select the features for prediction, the accuracy mean of
Random Forrest increased to 0.7333 revealing that feature selection method such as PCC is important for
improving model performance.