Comparative Performance Evaluation of Random Forest on Web-based Attacks

Authors

  • UJISCITR Editor
  • Oluwaseye Abayomi Adeyemi Department of Health Sciences, Bamidele Olumilua University of Education, Science and Technology, Ikere, Ekiti – State
  • Azeez Ajani Waheed Department of Mathematics, Lead City University, Ibadan, Oyo – State
  • Ogunsanwo Olajide Damilola Department of Computer Science, Lead City University, Ibadan, Oyo – State

Keywords:

Web Based Attacks, Random Forest, Correlation Feature Selection, machine learning

Abstract

The majority of typical online attack methods are thoroughly researched and documented. Countries,
corporations, people, and vital infrastructures that depend on information technology for daily operations have
suffered financial losses, the loss of personal information, and economic harm as a result of web-based intrusion. However, foreseeing an attack before it happens can aid in its prevention. This research proposes a predictive model for web-based attacks and a performance comparison of random forest with and without feature selection to secure the availability, integrity, and secrecy of networks, computer systems, and their data. The CIC-Bell-IDS2017 dataset, which includes typical and contemporary intrusion attacks, served as the raw data source for the proposed model. A python-based programming environment and interface for Anaconda Navigator, Jupyter Notebook, was used to create the predictive models. Performance evaluation and
comparative analysis were conducted, and the results demonstrate that, once big data analytics (feature scaling and feature selection) were applied to the dataset, the models' prediction accuracies improved, creating a potential intrusion detection system. The outcome yielded excellent accuracy and model development times in both cases, with 97% and 98% precision for both sets and model development times of 35 seconds for the raw set and 15 seconds for the reduced set, which is an important factor when deploying machine learning models in a real-time setting. Random Forest is more computationally expensive than Correlation feature Selection-based classifiers, but having higher predictive accuracy, according to a comparison. Both of these methods work well and each has advantages and disadvantages. The use of big data analytics (PySpark) was found to help machine learning models perform better, resulting in better intrusion detection system. 

Downloads

Published

2024-06-11