Session Information
Session Type: Poster Session A
Session Time: 10:30AM-12:30PM
Background/Purpose: Identifying patients with eosinophilic granulomatosis with polyangiitis (EGPA) in claims databases such as the French National Health Data System (SNDS) is a major challenge for real-world studies. Although a specific ICD-10 diagnosis code (M301) exists, patient identification may be incomplete because i) the disease does not always require hospitalisation and ii) long-term disease (LTD) is mostly recorded under the broader code M30, which includes other vasculitis. This study aims to develop and evaluate a machine learning algorithm to identify EGPA patients in the SNDS.
Methods: Adults ( >18) with a hospitalization M301 or an LTD M30/M300/M301/M302/M303/M308 in 2010-2019 were extracted from the SNDS. Among them, patients were identified as (1) EGPA with a M301 code and without any M300/M302/M303/M308/M317 code or (2) non-EGPA with a M300/M302/M303/M308 LTD or a M30 LTD with a M300/M302/M303/M308/M317 hospitalization, and without any M301 code, and were included in the labeled data used for model training and testing. Unlabeled patients were constituted of all other patients. Potential clinically relevant variables to predict EGPA diagnosis and differentiate it from other vasculitis were defined with experts and included medical visits, treatments, comorbidities and hospitalizations. Variables were pre-selected using univariate and multivariate analysis. Several supervised machine learning techniques were then implemented. For each combination, a 4-fold cross-validation method was implemented and repeated 100 times. Average metrics (accuracy, sensitivity, specificity, and precision) were calculated. As a final step, the model with the highest precision and sensitivity score was chosen among the 5 final models and the model was tested on the EGPA patients from French Vasculitis Study Group (FVSG) registry linked to the SNDS.
Results: On 8,756 individuals with a M301 hospitalization or an LTD M30/M300/M301/M302/M303/M308 in 2010-2019, 2,243 (25.6%) were labeled EGPA and 1,223 (14.0%) were labeled non-EGPA, and 5,290 were unlabeled patients. Thirteen variables were selected and included in the supervised models, with the most significant predictive factors being glucocorticoids use, asthma and pneumologist visits. All the five supervised models resulted in good performance [metric (min-max): precision (0.79-0.82); sensitivity (0.79-0.81); specificity (0.71-0.77); accuracy (0.80-0.83)].The model finally chosen was a random forest model which, when applied to the 5,290 unlabeled patients, identified 1,930 EGPA patients, with a precision of 0.82, a sensitivity of 0.81, a specificity of 0.75 and an accuracy of 0.83. The estimated prevalence and incidence of EGPA in France was 47.5 and 2.8 per million inhabitants, respectively, in 2019. Out of 49 incident EGPA patients from the FVSG registry, all but one (98%) were predicted as EGPA, confirming the robustness of the model.
Conclusion: This machine learning-based approach shows promising results for identifying adults with EGPA using SNDS data. The algorithm developed could serve as a valuable tool for real-world study based on comprehensive database.
To cite this abstract in AMA style:
Terrier B, Tauty S, Salhi A, Bugnard F, Benard S, Cottin V, Taillé C, Puéchal X. Supervised machine learning algorithm to identify patients with eosinophilic granulomatosis with polyangiitis in France [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/supervised-machine-learning-algorithm-to-identify-patients-with-eosinophilic-granulomatosis-with-polyangiitis-in-france/. Accessed .« Back to ACR Convergence 2025
ACR Meeting Abstracts - https://acrabstracts.org/abstract/supervised-machine-learning-algorithm-to-identify-patients-with-eosinophilic-granulomatosis-with-polyangiitis-in-france/