Session Information
Session Type: Poster Session B
Session Time: 10:30AM-12:30PM
Background/Purpose: Systemic lupus erythematosus (SLE) is a severe and heterogeneous autoimmune disease. SLE is often preceded by a stage with milder symptoms and positive antinuclear antibody (ANA+) tests, called Incomplete Lupus Erythematosus (ILE). Yet only 10-25% ILE patients progress to SLE. Deriving biomarkers that can forecast ILE to SLE progression is important for early diagnosis and interventions.
Methods: We utilized electronic health record (EHR) data from the TriNetX Research Network to derive clinical risk markers that can forecast disease progression from ILE to SLE. ANA+ patients were identified using a recorded ANA positive test result (titer ≥ 1:80). We defined our cases (or controls) as ANA+ individuals progressing (or not progressing) to SLE if they have (or do not have) SLE diagnosis within 3 months to 5 years after first ANA+ test. A 3-month period between ANA+ and SLE diagnosis ensures that ANA+ individuals are unlikely to already have SLE at the time of the ANA+ test. To minimize the impact of censored data, we only included patients with clinical data available at least 2 years before and 5 years after the ANA+ test. We utilized diagnosis, procedure, medication, and laboratory/vital records prior to ANA+ test date as clinical predictors to train machine learning models to predict SLE diagnosis after an ANA+ test result. Training and testing of these models were carried out using patients in the Research Network (10,897 controls and 1,485 cases) and Diamond Network (4,391 controls and 1,145 cases) subset of TriNetX respectively.
Results: A simple linear regression model of individual clinical predictors (controlling for age, sex, and race) identified marginally significant EHR codes for immune system disorders, peripheral opioid receptor antagonists, and basophil counts as significant predictors of future SLE diagnosis, potentially indicating early immune system involvement. Using the marginally significant predictors, we then trained a gradient boosting model (GBM), which yielded an area under the receiver operating curve of 0.75 in the training dataset (10-fold cross validation) and 0.70 in the testing dataset. The GBM model explained 4.32% variance, a 77% improvement over individual marginally significant EHR codes which explained on average 2.44% variance (range 2.18-2.51%). Additionally, the predicted value from the GBM model can stratify ANA+ patients that have high risk of progressing to SLE. Specifically, patients with GBM predicted values in the top 5th percentile have more than 2.65x increased risk of progression (54.64%) from ILE to SLE compared to individuals with predicted values in the 50th percentile (20.59%).
Conclusion: Machine learning models using EHR data have significant potential for identifying ANA+ patients who are at high risk of progressing to SLE. Advantages of the EHR include widespread availability of data that does not incur additional costs for collection. Given the high demand for and limited supply of rheumatology expertise, an approach to triaging ANA positive patients by risk of developing SLE would have a significant impact on optimizing care in early stages of SLE when remittive or curative therapy is most likely to be effective.
To cite this abstract in AMA style:
Markus H, Khunsriraksakul C, Foulke G, Carrel L, Olsen N, Liu D. Utilizing Electronic Health Records to Identify Clinical Features of ANA-Positive Patients Imparting High Risk for Progression to Systemic Lupus Erythematosus [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/utilizing-electronic-health-records-to-identify-clinical-features-of-ana-positive-patients-imparting-high-risk-for-progression-to-systemic-lupus-erythematosus/. Accessed .« Back to ACR Convergence 2024
ACR Meeting Abstracts - https://acrabstracts.org/abstract/utilizing-electronic-health-records-to-identify-clinical-features-of-ana-positive-patients-imparting-high-risk-for-progression-to-systemic-lupus-erythematosus/