A comparative machine learning framework for breast cancer diagnosis: Benchmarking algorithms and emphasizing model interpretability
DOI:
https://doi.org/10.5281/zenodo.18108591Keywords:
glmnet, Benign, Malignant, Feature Importance, Predictive ModelingAbstract
This study evaluated the performance of four machine learning models regularized logistic regression (GLMNET), random forest (RF), extreme gradient boosting (XGB), and support vector machine (SVM) for the binary classification of breast cancer cases using a dataset comprising 357 benign (62.7%) and 212 malignant (37.3%) samples. Model training and evaluation were performed using repeated cross-validation, with performance assessed through ROC, sensitivity, specificity, and accuracy. Among the models, GLMNET achieved the best performance, with the highest cross-validation ROC (0.992) and a strong balance between sensitivity (0.982) and specificity (0.935). On the independent test set, GLMNET demonstrated excellent discrimination (AUC = 0.998), high accuracy (98.2%, 95% CI: 93.8–99.8), sensitivity (98.6%), and specificity (97.6%), with a Kappa of 0.962 indicating near-perfect agreement. Feature importance analysis revealed PC02, PC01, and PC04 as the most influential predictors. These results suggest that GLMNET provides robust and highly accurate classification performance, making it a suitable model for breast cancer prediction in this dataset.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Technoscience Journal for Community Development in Africa

This work is licensed under a Creative Commons Attribution 4.0 International License.