An Interpretable Artificial Intelligence Model for Chronic Kidney Disease Diagnosis Using Decision Trees
Abstract
Chronic Kidney Disease (CKD) is a progressive condition that, without early intervention, can lead to serious health complications and increased mortality. Traditional diagnostic methods rely on extensive laboratory tests, which can be time-consuming and inaccessible in resource-limited settings. In this study, we investigate the application of a machine learning model based on Decision Trees (DT) enhanced with Feature Selection (FS) for diagnosing CKD using publicly available data. The CKD dataset, obtained from the UCI Machine Learning Repository, consists of medical and demographic attributes. Using the SelectKBest method, the top ten most relevant features were identified for training the DT model. Cross-validation was applied to assess the model's robustness, achieving an average accuracy of 97.50% across folds and 98.75% on the independent test set. These results demonstrate that the DT model with FS offers a promising diagnostic tool for CKD, providing high accuracy with interpretable results that are clinically relevant. This approach holds potential to assist healthcare professionals in early CKD diagnosis, enabling timely interventions and resource optimization.
References
World Health Organization, “Global Health Observatory: Chronic Kidney Disease.” Accessed: Nov. 06, 2024. [Online]. Available: https://www.who.int/
A. Schieppati and G. Remuzzi, “Chronic renal diseases as a public health problem: epidemiology, social, and economic implications,” Kidney Int, vol. 68, pp. S7–S10, 2005.
National Institute of Diabetes and Digestive and Kidney Diseases, “Tests & Diagnosis of Chronic Kidney Disease (CKD).” Accessed: Nov. 06, 2024. [Online]. Available: https://www.niddk.nih.gov/
Hopkins Medicine, “Chronic Kidney Disease: Diagnosis and Treatment Options.” Accessed: Nov. 06, 2024. [Online]. Available: https://www.hopkinsmedicine.org/
A. Levin and P. E. Stevens, “Early detection of CKD: the benefits, limitations and effects on prognosis,” Nat Rev Nephrol, vol. 7, no. 8, pp. 446–457, 2011.
C. Delrue, S. De Bruyne, and M. M. Speeckaert, “Application of machine learning in chronic kidney disease: current status and future prospects,” Biomedicines, vol. 12, no. 3, p. 568, 2024.
C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nat Mach Intell, vol. 1, no. 5, pp. 206–215, 2019.
F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.
J. R. Quinlan, “Induction of decision trees,” Mach Learn, vol. 1, no. 1, pp. 81–106, Mar. 1986, doi: 10.1007/BF00116251.
L. Breiman, “Random forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
J. R. Quinlan, C4. 5: programs for machine learning. Elsevier, 2014.
O. Z. Maimon and L. Rokach, Data mining with decision trees: theory and applications, vol. 81. World scientific, 2014.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification And Regression Trees. Routledge, 2017. doi: 10.1201/9781315139470.
B. Letham, C. Rudin, T. H. McCormick, and D. Madigan, “Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model,” 2015.
L. M. Bache K, “UCI machine learning repository.” [Online]. Available: http://archive.ics.uci.edu/ml
A. Sheta, W. El-Ashmawi, and A. Baareh, “Heart Disease Diagnosis Using Decision Trees with Feature Selection Method,” The International Arab Journal of Information Technology, vol. 21, no. 3, 2024, doi: 10.34028/iajit/21/3/7.
F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, no. null, pp. 2825–2830, Nov. 2011.
I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Mach. Learn. Res., vol. 3, no. null, pp. 1157–1182, Mar. 2003.