Session Information
Session Type: Abstract Session
Session Time: 4:00PM-5:30PM
Background/Purpose: Real-world data including electronic health records (EHRs) are a promising resource for learning to optimize treatment strategies for rheumatoid arthritis (RA). A major challenge in leveraging real-world data in rheumatology is the lack of standardized collection of disease activity measures. Previous studies had limited success inferring disease activity with administrative claims and EHR data. This study aimed to assess the accuracy of inferring disease activity as measured by the Disease Activity Score in 28 joints with CRP (DAS28-CRP) using both structured and narrative EHR data extracted from notes with natural language processing (NLP).
Methods: We studied RA patients from a single center registry linked with EHR data. The structured data included RA-related diagnosis and procedure codes, medication prescriptions, and laboratory test encounters and values. The NLP data included mentions of RA and disease activity concepts. Models were trained on DAS28-CRP obtained during in-person study visits from the registry. For each visit, structured and NLP data were extracted from EHR encounters within 24 weeks. In 80% of the visits, we fit separate random forest models to predict the continuous DAS28-CRP value and the binary disease activity status categorized into remission/low (LDA; DAS28≤3.2) vs moderate/high disease activity (DAS28 >3.2). We validated the predictions in the remaining 20%. To assess the accuracy of predicting DAS28-CRP values, we estimated the mean absolute error (MAE; lower values indicate lower error), percentage of predictions within 0.6 (reported measurement error for DAS28-CRP), and 1.2 (minimal clinically important difference (MCID)) of the observed values. For LDA status, we calculated the area under the curve (AUC). Observed values and probabilities were plotted against predicted values and mean predicted probabilities in deciles to further assess prediction performance. We identified influential EHR features for predictions using Gini impurity. These analyses were repeated with and without including NLP data. We benchmarked against manual chart-review for inferring LDA, using LDA defined by DAS28-CRP as reference, in a sample of 67 visits.
Results: We identified 4,883 visits among 1,059 patients with a DAS28-CRP score. The mean age at first visit was 60.5 years old, with 83.6% of patients being female and 89.4% White. The MAE for DAS28-CRP values was 0.778, with 84% and 44% of absolute errors within 1.2 (MCID) and 0.6 (measurement error). The AUC for LDA was 0.781 (Figure 1). Incorporating NLP data consistently improved prediction performance (Table 1). Features with the highest importance included CRP and ESR values, age, receiving a CRP test, and NLP mentions of disease activity and glucocorticoids. The model incorporating NLP data achieved a higher AUC over manual chart review.
Conclusion: Inferring disease activity with EHR data collected from routine care, particularly with the addition of data from narrative notes, achieved moderate accuracy against prospectively collected DAS28-CRP measures. Further work is needed to validate whether these inferred disease activity measures can be applied to reliably assess response to treatment in observational data.
To cite this abstract in AMA style:
Cheng D, Weisenfeld D, Dahal K, Liu Q, Ayakulangara Panickan V, Jeffway M, Seyok T, McDermott G, Weinblatt M, Shadick N, Cai T, Liao K. Inferring Disease Activity Scores and Low Disease Activity at Registry Visits Based on Structured and Narrative Data from Electronic Health Records [abstract]. Arthritis Rheumatol. 2023; 75 (suppl 9). https://acrabstracts.org/abstract/inferring-disease-activity-scores-and-low-disease-activity-at-registry-visits-based-on-structured-and-narrative-data-from-electronic-health-records/. Accessed .« Back to ACR Convergence 2023
ACR Meeting Abstracts - https://acrabstracts.org/abstract/inferring-disease-activity-scores-and-low-disease-activity-at-registry-visits-based-on-structured-and-narrative-data-from-electronic-health-records/