Improving Breast Cancer Classification with Adaptive Synthetic Sampling, Feature Selection, and Hyperparameter Optimization
DOI:
https://doi.org/10.14500/aro.12386Keywords:
Adaptive Synthetic Sampling, Arctic Puffin Optimisation, Breast Cancer Detection, Feature Selection, Hyperparameter Optimization, Machine LearningAbstract
Breast cancer is a major global health concern, highlighting the need for accurate and efficient diagnostic solutions rather than persistent issues with detection accuracy. This study presents an enhanced machine learning framework to improve breast cancer classification by addressing key limitations: Class imbalance, irrelevant features, and suboptimal hyperparameters. Adaptive synthetic sampling (ADASYN) was used to balance class distribution and various feature selection techniques. Univariate Selection and recursive feature elimination improved feature relevance, and arctic puffin optimization (APO) was applied for hyperparameter tuning. Multiple classifiers were evaluated using the Wisconsin Diagnostic Breast Cancer dataset. The random forest (RF) with ADASYN approach, optimized using APO, achieved outstanding results – 99.53% accuracy, 100% precision, 99.07% recall, and 99.53% F1-score – with only one misclassification out of 569 samples. This framework, while not modifying ADASYN or RF algorithms themselves, significantly enhances diagnostic performance and serves as a robust foundation for clinical decision support systems.
Downloads
References
Ahamed, M.R.H., 2024. Early detection: Machine learning for breast cancer prediction. In: Conference: Early Detection: Machine Learning for Breast Cancer PredictionAt: Galle, Sri Lanka.
Aiyeniko, O., 2023. Performance evaluation of metaheuristic algorithms for feature selection in breast cancer predictive model. American Journal of Computer Sciences and Applications, 11(2), pp.1-10.
Allam, M., and Nandhini, M., 2022. Optimal feature selection using binary teaching learning based optimization algorithm. Journal of King Saud University - Computer and Information Sciences, 34, pp.329-341. DOI: https://doi.org/10.1016/j.jksuci.2018.12.001
Assegie, T.A., Salau, A.O., Sampath, K., Govindarajan, R., Murugan, S., and Lakshmi, B., 2024. Evaluation of adaptive synthetic resampling technique for imbalanced breast cancer identification. Procedia Computer Science, 235, pp.1000-1007. DOI: https://doi.org/10.1016/j.procs.2024.04.095
Breiman, L., 2001. Random forests. Machine Learning, 45, pp.5-32. Cortes, C., Vapnik, V., 1995. Support-vector networks. Machine Learning, 20, pp.273-297. DOI: https://doi.org/10.1023/A:1010933404324
Cover, T.M., and Hart, P.E., 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), pp.21-27. DOI: https://doi.org/10.1109/TIT.1967.1053964
Cox, D.R., 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20, pp.215-232. DOI: https://doi.org/10.1111/j.2517-6161.1958.tb00292.x
Gárate-Escamila, A.K., Hajjam El Hassani, A., and Andrès, E., 2020. Classification models for heart disease prediction using feature selection and PCA. Informatics in Medicine Unlocked, 19, 100330. DOI: https://doi.org/10.1016/j.imu.2020.100330
Gopal, V.N., Al-Turjman, F., Kumar, R., Anand, L., and Rajesh, M., 2021. Feature selection and classification in breast cancer prediction using IoT and machine learning. Measurement, 178, p.109442. DOI: https://doi.org/10.1016/j.measurement.2021.109442
Gupta, P., and Garg, S., 2020. Breast cancer prediction using varying parameters of machine learning models. Procedia Computer Science, 171, pp.593-601. DOI: https://doi.org/10.1016/j.procs.2020.04.064
He, H., Bai, Y., Garcia, E.A., and Li, S., 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, IEEE. pp.1322-1328. DOI: https://doi.org/10.1109/IJCNN.2008.4633969
Krawczyk, B., 2016. Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5, pp.221-232. DOI: https://doi.org/10.1007/s13748-016-0094-0
Kulkarni, A., Chong, D., and Batarseh, F.A., 2020. Foundations of data imbalance and solutions for a data democracy. In: Data Democracy: At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering, Academic Press, United States, pp.83-106. DOI: https://doi.org/10.1016/B978-0-12-818366-3.00005-8
Lukong, K.E., 2017. Understanding breast cancer – The long and winding road. BBA Clinical, 7, pp.64-77. DOI: https://doi.org/10.1016/j.bbacli.2017.01.001
Magboo, V.P.C., and Magboo, M.S., 2021. Machine learning classifiers on breast cancer recurrences. Procedia Computer Science, 192, pp.2742-2752. DOI: https://doi.org/10.1016/j.procs.2021.09.044
Mangasarian, O.L., Street, W.N., and Wolberg, W.H., 1994. Breast cancer diagnosis and prognosis via linear programming. AAAI Spring Symposium – Technical Report, SS-94-01, pp.83-86.
Minnoor, M., and Baths, V., 2022. Diagnosis of breast cancer using random forests. Procedia Computer Science, 218, pp.429-437. DOI: https://doi.org/10.1016/j.procs.2023.01.025
Mowri, R.A., Siddula, M., and Roy, K., 2023. Is iterative feature selection technique efficient enough? A comparative performance analysis of RFECV feature selection technique in ransomware classification using SHAP. Discover Internet of Things, 3, p.21. DOI: https://doi.org/10.1007/s43926-023-00053-2
Naji, M.A., Filali, S.E., Aarika, K., Benlahmar, E.H., Abdelouhahid, R.A., and Debauche, O., 2021. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science, 191, pp.487-492. DOI: https://doi.org/10.1016/j.procs.2021.07.062
Nemade, V., and Fegade, V., 2022. Machine learning techniques for breast cancer prediction. Procedia Computer Science, 218, pp.1314-1320. DOI: https://doi.org/10.1016/j.procs.2023.01.110
Obaid, O.I., Mohammed, M.A., Abd Ghani, M.K., Mostafa, S.A., and AlDhief, F.T., 2018. Evaluating the performance of machine learning techniques in the classification of Wisconsin Breast Cancer. International Journal of Engineering and Technology, 7, pp.160-166. DOI: https://doi.org/10.14419/ijet.v7i4.36.23737
Quinlan, J.R., 1986. Induction of decision trees. Machine Learning, 1(1), pp.81-106. DOI: https://doi.org/10.1023/A:1022643204877
Uddin, K.M.M., Biswas, N., Rikta, S.T., and Dey, S.K., 2023. Machine learningbased diagnosis of breast cancer utilizing feature optimization technique. Computer Methods and Programs in Biomedicine Updates, 3, p.100098. DOI: https://doi.org/10.1016/j.cmpbup.2023.100098
Wang, W.C., Tian, W.C., Xu, D.M., and Zang, H.F., 2024. Arctic puffin optimization: A bio-inspired metaheuristic algorithm for solving engineering design optimization. Advances in Engineering Software, 195, p.103694. DOI: https://doi.org/10.1016/j.advengsoft.2024.103694
Wolberg, W., Mangasarian, O., Street, N., and Street, W., 1993. Breast cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository.
Yadav, R.K., Singh, P., and Kashtriya, P., 2022. Diagnosis of breast cancer using machine learning techniques - a survey. Procedia Computer Science, 218, pp.1434-1443. DOI: https://doi.org/10.1016/j.procs.2023.01.122
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Hayder N. Jasim, Wesam M. Jasim, Mohammed S. Ibrahim

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
-
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
-
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.
Accepted 2026-02-01
Published 2026-03-15







ARO Journal is a scientific, peer-reviewed, periodical, and diamond OAJ that has no APC or ASC.