Application of Machine Learning Algorithms in Predicting the Toxicity of Chemical Compounds for Safer Pharmaceuticals
Keywords:
Machine learning, toxicity prediction, Chemical compounds, Pharmaceutical safety, Drug development, Ethical testing, Predictive modeling, Computational toxicologyAbstract
The development of safe pharmaceuticals requires accurate and efficient prediction of chemical toxicity to minimize adverse health risks and reduce reliance on costly and ethically challenging animal testing. This study investigates the application of three machine learning (ML) algorithms—Random Forest (RF), Support Vector Machine (SVM), and Linear Regression (LR)—for predicting the toxicity of aromatic chemical compounds. A dataset of 11,001 compounds was curated, preprocessed, and analyzed using molecular descriptors such as molecular weight, lipophilicity, and polar surface area. Model performance was evaluated using accuracy, precision, recall, F1-score, and specificity. Results showed that the Linear Regression model performed poorly, with accuracy around 52%, indicating limited suitability for toxicity classification. The SVM model achieved substantially better results, with an accuracy of 80%, demonstrating its effectiveness in capturing nonlinear structure–toxicity relationships. Notably, the Random Forest model outperformed both, achieving perfect classification accuracy (100%) across all metrics, with zero false positives and false negatives. Feature importance analysis revealed that descriptors such as Topological Polar Surface Area and Molecular Fractional Polar Surface Area were key contributors to toxicity prediction. The findings demonstrate that Random Forest is a robust and interpretable tool for early toxicity screening, offering both predictive accuracy and insight into molecular features driving toxicity. By integrating ML models into pharmaceutical research pipelines, drug discovery can be accelerated, costs reduced, and ethical imperatives met by minimizing animal testing. Future work should focus on external validation, hybrid model development, and explainable AI techniques to enhance generalizability and regulatory acceptance.