Date: Sunday, November 8, 2015
Session Type: ACR Concurrent Abstract Session
Session Time: 2:30PM-4:00PM
Background/Purpose: Systemic lupus erythematosus
(SLE) can be difficult to study given the relatively low prevalence of the
disease. While we are now collecting a multitude of data in the electronic
health record (EHR) that can be used to answer important research questions,
there are no validated algorithms for SLE case-identification. In this study, we
aimed to establish an algorithm to identify individuals with SLE from an EHR.
We compared algorithms using structured terminologies for SLE with those based
on novel machine learning (ML) algorithms that are capable of analyzing
clinical free text in conjunction with structured data.
Methods: We created a data repository of “possible
SLE” by extracting structured EHR data and clinical notes for patients who
either had a relevant ICD-9 code OR positive dsDNA or Smith antibodies OR a
mention of “SLE” or “lupus” in the text of clinical notes. We identified 18,357
individuals meeting this criteria, and selected 300 patients for review: 150
randomly and 150 enriched for positive serologies (to capture enough true SLE for
ML). Via chart review performed by domain experts, we identified definite SLE
cases as patients with ≥4 documented American College of Rheumatology (ACR)
criteria. Next, we calculated the test characteristics for various definitions
of SLE using only structured data. Finally, we compared this to a series of supervised
ML algorithms based on support vector machines (SVMs) which used text features
extracted from clinical notes in addition to structured fields. All SVM
algorithms were trained and validated using 10-fold cross-validation.
Results: 121 of 300 patients reviewed met ACR
criteria for SLE. The test characteristics of both the structured and
supervised ML algorithms are shown in the Table. While a single ICD-9 code for
710.0 was near 100% sensitive for a diagnosis of SLE, additional criteria
including a SLE-related medication and any positive serology increased the
specificity with minimal loss in sensitivity (94%). ML algorithms slightly
outperformed traditional definitions. The text features extracted from
clinical notes that were strong predictors of SLE are graphically represented
in the Figure.
Conclusion: In an EHR-based data repository, a single
ICD-9 710.0 was highly sensitive for SLE. ML algorithms processed a multitude
of structured and unstructured EHR data, allowing increased specificity with
minimal loss in sensitivity. These findings should be validated in other
cohorts to lay the foundation for multi-institutional lupus research and
creation of large national registries.
To cite this abstract in AMA style:Murray SG, Tonner C, Marafino BJ, Haserodt S, Schmajuk G, Yazdany J. Automated Case Identification of Lupus from an Electronic Health Record Using Novel Informatics Approaches [abstract]. Arthritis Rheumatol. 2015; 67 (suppl 10). https://acrabstracts.org/abstract/automated-case-identification-of-lupus-from-an-electronic-health-record-using-novel-informatics-approaches/. Accessed February 21, 2020.
« Back to 2015 ACR/ARHP Annual Meeting
ACR Meeting Abstracts - https://acrabstracts.org/abstract/automated-case-identification-of-lupus-from-an-electronic-health-record-using-novel-informatics-approaches/