Session Information
Session Type: Poster Session C
Session Time: 10:30AM-12:30PM
Background/Purpose: Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease, with 4-28% of patients developing End-Stage Renal Disease (ESRD). Accurate identification of these patients is essential for research and resource allocation. Typically, case identification either involves using administrative claims or diagnostic code data or manual parsing through large datasets, which can be inaccurate and time-consuming. This study aims to develop a machine learning model to efficiently and accurately identify patients with SLE and ESRD from large datasets, and to evaluate the impact of incorporating social determinants of health (SDOH) data on model performance.
Methods: We utilized the NIH “All of Us” dataset, which includes diverse demographic and SDOH data, to create a cohort of patients with SLE. Clinical and SDOH features were used to build and evaluate machine learning models (Random Forest and XGBoost). The dataset was split into training (80%) and test (20%) sets. Model performance was assessed using accuracy, precision, recall, F1-score, and AUC. SHAP values were used to interpret model predictions.
Results: The study cohort included 1101 patients, with 65 having ESRD. XGBoost models outperformed Random Forest models, achieving AUCs of 0.93 (with SDOH) and 0.94 (without SDOH), compared to 0.92 (with SDOH) and 0.89 (without SDOH) for Random Forest. Incorporating SDOH data improved the precision and F1-scores of the XGBoost models. Feature Importance Rankings from XGBoost Model (w/ SDOH) indicated that SDOH variables, particularly those related to healthcare utilization and costs, were among the top 20 features influencing model predictions.
Conclusion: XGBoost models demonstrated superior performance in identifying patients with SLE and ESRD. The inclusion of SDOH data enhanced model precision and real-world interpretability. This study supports the feasibility of using machine learning for automated case identification in large datasets, although addressing class imbalance and model generalizability remains crucial for future work.
To cite this abstract in AMA style:
Felix M, Osmani L. Automated Case Identification of Patients with End-Stage Renal Disease and Systemic Lupus Erythematosus Using Machine Learning [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/automated-case-identification-of-patients-with-end-stage-renal-disease-and-systemic-lupus-erythematosus-using-machine-learning/. Accessed .« Back to ACR Convergence 2024
ACR Meeting Abstracts - https://acrabstracts.org/abstract/automated-case-identification-of-patients-with-end-stage-renal-disease-and-systemic-lupus-erythematosus-using-machine-learning/