Session Information
Session Type: Poster Session A
Session Time: 10:30AM-12:30PM
Background/Purpose: Patients with rheumatoid arthritis often experience clinically significant delays in diagnosis (Sørensen et al., 2015; Raza et al., 2011). RA can present similarly to other types of inflammatory arthritis, and can therefore be challenging for a primary care physician (PCP) to recognize (Saraiva et al., 2023). Indeed, the longest delay is after initial presentation to the PCP till evaluation by rheumatology (Barhamian et al., 2017; Stack et al., 2019). Digital tools such as machine learning algorithms have the potential to help physicians identify patients with undiagnosed RA earlier in the course of disease. In this study we describe the development and validation of a novel machine learning model for identifying patients in the community who may be at risk of having undiagnosed RA.
Methods: Patients from the community population (n=395,918 patients) at Mayo Clinic between 2012 and 2022 were split into training and validation sets. Cases with RA and controls with no evidence of RA were identified in both sets. Prediction dates for model training and evaluation were set at six month intervals on the 1st of January and July of each year. Cases were assigned to the prediction date directly preceding autoantibody testing before their first diagnosis of RA. This design was chosen to expose the model to information preceding clinical suspicion for disease. Controls were randomly assigned to prediction dates based on data eligibility. A gradient boosted trees algorithm was trained using electronic medical record (EMR) data documented during the two years prior to each patient’s prediction date. Input features included information from the structured data (age, sex, diagnosis codes, medication prescriptions and laboratory results), and symptoms and signs that were extracted from clinical notes by natural language processing (NLP). The model was then evaluated on the validation set, and area under the curve (AUC) was used to assess the model’s ability to discriminate between new cases of RA and controls
Results: The validation set included 145 patients with RA (108 females; mean age 55.7, standard deviation [SD] 16.3) and 17,702 control patients (9,758 females; mean age 49.3, SD 17.0). The AUC on the validation set was 77.9% (fig 1). Symptoms and signs documented in clinical notes and diagnosis codes were important predictive features, including arthritis, pain and swelling in various joints, enthesopathies and synovitis (fig 2.). Additional contributing features included elevated inflammatory markers and glucocorticoid and NSAID use.
Conclusion: The model displayed good performance in its ability to discriminate between cases of RA and controls. Implementation of the model may help PCPs identify undiagnosed RA in the primary care population using existing data from the EMR. Improving time to diagnosis could help patients receive treatment and reduce downstream sequelae from untreated disease. Features from structured data and unstructured data contributed to model performance. The important contribution of features extracted by NLP from clinical documents suggests that further improvements in model performance may come from refined NLP techniques.
To cite this abstract in AMA style:
Dreyfuss M, Jenudi Y, Riesel D, Ramni O, Underberger D, Getz B, Steinberg-Koch S, White D, Myasoedova E. A Machine Learning Model for the Early Identification of Rheumatoid Arthritis: Development and Validation [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/a-machine-learning-model-for-the-early-identification-of-rheumatoid-arthritis-development-and-validation/. Accessed .« Back to ACR Convergence 2024
ACR Meeting Abstracts - https://acrabstracts.org/abstract/a-machine-learning-model-for-the-early-identification-of-rheumatoid-arthritis-development-and-validation/