Comparative Analysis of Machine Learning Models for Fraud Detection in Imbalanced Credit Card Transaction Datasets
Keywords:
Credit card fraud, machine learning, imbalanced datasets, SMOTE, Random Forest, XGBoost, Random SamplingAbstract
Fraud detection in imbalanced datasets presents a major challenge in financial domains, particularly in credit card fraud detection. This paper presents a comparative analysis of popular machine learning models—Logistic Regression, Decision Trees, Random Forest, and XGBoost applied to real and simulated fraud datasets. Various data preprocessing techniques, such as SMOTE, Random Sampling, were employed to address class imbalance. The results indicate that ensemble models, particularly Random Forest and XGBoost, outperformed traditional models, achieving near-perfect F1-scores (0.999) and accuracies (0.999) across both datasets. These findings provide insight into model effectiveness in fraud detection tasks and offer a foundation for developing robust, adaptive fraud detection systems