Application of Text Mining Methods to Identify Lupus Nephritis from Electronic Health Records

Milena Gianfrancesco¹, Suzanne Tamang², Gabriela Schmajuk³ and Jinoos Yazdany⁴, ¹University of California, San Francisco, San Francisco, CA, ²Stanford Center for Population Health Sciences, Redwood City, CA, ³University of California, San Francisco, Atherton, CA, ⁴UCSF, San Francisco, CA

Meeting: ACR Convergence 2020

Keywords: informatics, Nephritis, Systemic lupus erythematosus (SLE)

Session Information

Date: Friday, November 6, 2020

Title: SLE – Diagnosis, Manifestations, & Outcomes Poster I: Clinical Manifestations

Session Type: Poster Session A

Session Time: 9:00AM-11:00AM

Background/Purpose: Lupus nephritis (LN) is a frequent complication of SLE and associated with higher morbidity and mortality. Accurate estimates of the prevalence of LN in the population remain limited due to the inability to capture this information through structured data fields such as ICD codes in electronic health records (EHR). We developed a text mining pipeline to extract information on LN and class from clinical notes in the EHR of a large, racially/ethnically diverse university health system.

Methods: Individuals with a single diagnosis code for SLE in the EHR between June 2012 – February 2019 were included. All available clinical notes (including physician ambulatory and hospital progress reports, and biopsy reports) were extracted and annotated using an open source clinical text-mining tool, the Clinical Event Recognizer (CLEVER). CLEVER is a hybrid information extraction approach that combines rule-based (semantic) and statistical components to annotate and extract information from millions of clinical notes efficiently. A custom-built dictionary that included “lupus nephritis,” “class,” and various associated terms referring to nephritis was built. Performance of the text-mining tool in identifying LN was assessed by calculating the sensitivity and specificity against a gold-standard subset of SLE patients for whom ACR criteria were assessed through chart review. We also compared our findings to a published algorithm in which LN is identified using structured data only.

Results: We included 2,782 SLE patients; these patients had a total of 614,683 clinical notes. Most patients were female (87%), with a mean age of 46.9 years (+ 17.9), and the sample was racially/ethnically diverse (Table 1). A total of 18,354 positive and 9,293 negative mentions of LN were detected using CLEVER. Positive mentions were captured for 848 unique individuals with SLE, indicating that 30% of our SLE population had LN, similar to previously published estimates. When compared to a gold-standard set of chart-reviewed cases (n=152), our text mining tool detected LN with 96% sensitivity and 94% specificity. Compared to a previous algorithm of LN detection based on structured data fields (such as ICD codes) alone, our text-mining pipeline identified 631 additional cases of LN that would have otherwise not been captured (Table 2). Chart review of notes for a random sample of 50 cases from these 631 additional cases indicated that 86% were true positives. Thirty-seven cases that were not tagged by CLEVER, but positive according to the structured data algorithm, were found by chart review to be true negatives, with phrases such as “unclear,” “possible,” “to be evaluated,” or “no evidence.” Additionally, of those with LN, CLEVER was able to detect the specific class of LN (I-VI) in 415 (50%) cases.

Conclusion: We developed the first text-mining strategy to extract SLE LN status and class from clinical notes. Additional evaluation on clinical notes from the EHRs of additional institutions is ongoing to examine the generalizability of this algorithm. Further refinement of the pipeline will allow us to determine factors associated with this important disease outcome.

Disclosure: M. Gianfrancesco, None; S. Tamang, None; G. Schmajuk, None; J. Yazdany, Eli Lilly, 5, Astra Zeneca, 5.

To cite this abstract in AMA style:

Gianfrancesco M, Tamang S, Schmajuk G, Yazdany J. Application of Text Mining Methods to Identify Lupus Nephritis from Electronic Health Records [abstract]. Arthritis Rheumatol. 2020; 72 (suppl 10). https://acrabstracts.org/abstract/application-of-text-mining-methods-to-identify-lupus-nephritis-from-electronic-health-records/. Accessed .

« Back to ACR Convergence 2020

ACR Meeting Abstracts - https://acrabstracts.org/abstract/application-of-text-mining-methods-to-identify-lupus-nephritis-from-electronic-health-records/