ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • ACR Convergence 2020
    • 2020 ACR/ARP PRSYM
    • 2019 ACR/ARP Annual Meeting
    • 2018-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 0328

Natural Language Processing to Identify Lupus Nephritis Phenotype in Electronic Health Records

Yu Deng1, Jennifer Pacheco1, Anh Chung1, Chengsheng Mao1, Joshua Smith2, juan zhao1, Wei-Qi Wei2, April Barnado3, Chunhua Weng4, Cong Liu4, Adam Gordon1, Jingzhi Yu1, Yacob Tedla1, Abel Kho1, Rosalind Ramsey-Goldman1, Theresa Walunas1 and Yuan Luo1, 1Northwestern University, Chicago, IL, 2Vanderbilt Universty Medical Center, Nashville, TN, 3Vanderbilt University Medical Center, Nashville, TN, 4Columbia University, New York, NY

Meeting: ACR Convergence 2021

Keywords: computational phenotyping, electronic health records, Lupus nephritis, natural language processing, Systemic lupus erythematosus (SLE)

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Saturday, November 6, 2021

Title: SLE – Diagnosis, Manifestations, & Outcomes Poster I: Diagnosis (0323–0356)

Session Type: Poster Session A

Session Time: 8:30AM-10:30AM

Background/Purpose: Lupus nephritis (LN) is a major disease manifestation of Systemic lupus erythematosus (SLE) leading to organ damage and increased mortality. Accurately identifying lupus nephritis in electronic health records (EHRs), a key component of SLE classification criteria domain, would add value to observational studies and clinical trials. However, information related to LN, e.g., kidney biopsy findings are usually present in clinical notes, not as structured data. In this study, we developed algorithms to identify LN with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). We hypothesize that NLP algorithms including information from the clinical notes will outperform the baseline algorithm using structured data only.

Methods: We identified 472 patients with SLE from the Chicago Lupus Database who also had at least four encounters in the NMEDW. We developed four algorithms: a rule-based algorithm using only structured data and three different NLP algorithms based on L2-regularized logistic regression. In the first NLP algorithm (Full-MetaMap-binary), we used the presence or absence of all the MetaMap extracted concept unique identifiers (CUIs) as features. In the second NLP algorithm (Full-MetaMap-count), we used the same CUIs as features but their number of occurrences as the feature value. In the third NLP algorithm (MetaMap-mixed), we used a mixture of features from structured data, regular expression (regex) concepts, and a curated list of CUIs related to LN. We evaluated all four algorithms in an internal validation dataset based on F-measure, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). We further validated the baseline algorithm and the best performed NLP algorithm on an external dataset from Vanderbilt University Medical Center (VUMC).

Results: In the NMEDW internal validation dataset, the Full-MetaMap-binary, Full-MetaMap-count, and MetaMap mixed model achieved F measures of 0.72, 0.71, 0.79, respectively, compared to the baseline model (F measure, 0.41) (see Table). In the external validation dataset, our best performing NLP model (MetaMap mixed model) improved F measure (0.62 vs 0.96) compared to the structured data only algorithm.

Table 1. Algorithm performance

Dataset Algorithm Sensitivity Specificity PPV NPV F Measure
NMEDW (internal validation) Baseline 0.43 0.6 0.39 0.64 0.41
NMEDW (internal validation) Full-MetaMap-binary 0.63 0.93 0.85 0.81 0.72
NMEDW (internal validation) Full-MetaMap-count 0.6 0.95 0.88 0.8 0.71
NMEDW (internal validation) MetaMap-mixed 0.74 0.92 0.84 0.86 0.79
VUMC Baseline 0.92 0.61 0.46 0.96 0.62
VUMC MetaMap-mixed 1 0.97 0.93 1 0.96

Conclusion: We developed three NLP models and compared them to a structured data only algorithm to identify LN from EHR. The best performing NLP algorithm incorporating structured data, CUIs, and regex concepts improved the F-measure in both internal and external validation datasets. NLP algorithms can serve as powerful tools to accurately identify LN in EHR for clinical research.


Disclosures: Y. Deng, None; J. Pacheco, None; A. Chung, None; C. Mao, None; J. Smith, None; j. zhao, None; W. Wei, None; A. Barnado, None; C. Weng, None; C. Liu, None; A. Gordon, None; J. Yu, None; Y. Tedla, None; A. Kho, Datavant, 1, 7, 11; R. Ramsey-Goldman, None; T. Walunas, None; Y. Luo, None.

To cite this abstract in AMA style:

Deng Y, Pacheco J, Chung A, Mao C, Smith J, zhao j, Wei W, Barnado A, Weng C, Liu C, Gordon A, Yu J, Tedla Y, Kho A, Ramsey-Goldman R, Walunas T, Luo Y. Natural Language Processing to Identify Lupus Nephritis Phenotype in Electronic Health Records [abstract]. Arthritis Rheumatol. 2021; 73 (suppl 9). https://acrabstracts.org/abstract/natural-language-processing-to-identify-lupus-nephritis-phenotype-in-electronic-health-records/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2021

ACR Meeting Abstracts - https://acrabstracts.org/abstract/natural-language-processing-to-identify-lupus-nephritis-phenotype-in-electronic-health-records/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM ET on November 14, 2024. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology