ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • ACR Convergence 2020
    • 2020 ACR/ARP PRSYM
    • 2019 ACR/ARP Annual Meeting
    • 2018-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 0566

Natural Language Processing of Electronic Health Record Notes Captures Forced Vital Capacity in Rheumatoid Arthritis-Associated Interstitial Lung Disease

Punyasha Roul1, Yangyuna Yang1, Daniel Hershberger1, Ted Mikuls1, Jorge Rojas2, Jeffrey Curtis3, Joshua Baker4, Brian Sauer5 and Bryant England1, 1University of Nebraska Medical Center, Omaha, NE, 2VA Puget Sound, Seattle, WA, 3Division of Clinical Immunology and Rheumatology, Department of Medicine, Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, 4University of Pennsylvania, Philadelphia, PA, 5University of Utah, Salt Lake City, UT

Meeting: ACR Convergence 2021

Keywords: informatics, interstitial lung disease, rheumatoid arthritis

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Sunday, November 7, 2021

Title: Epidemiology & Public Health Poster II: Inflammatory Arthritis – RA, SpA, & Gout (0560–0593)

Session Type: Poster Session B

Session Time: 8:30AM-10:30AM

Background/Purpose: Rheumatoid arthritis-interstitial lung disease (RA-ILD) has a poor long-term prognosis, including premature mortality. Longitudinal monitoring of patients in clinical and clinical trial settings includes the routine measurement of forced vital capacity (FVC) from pulmonary function tests (PFT). However, the availability of PFT data in real-world datasets is highly variable, limiting examination of this key outcome in long-term observational studies. Natural Language Processing (NLP), an artificial intelligence method, has been used to transform unstructured text from the electronic health record (EHR) to structured clinical data. We aimed to develop a NLP program to capture FVC values from EHR notes.

Methods: We identified patients in the Veterans Health Administration (VA) with RA-ILD between 2000 and 2020 by ICD-9/10 codes for RA and ILD, based on previously validated RA-ILD algorithms. We developed a NLP program to capture FVC values from all available notes from the EHR using MS SQLServer, based on a program for FEV1 values (Akgun et al. PLoS ONE, 2020). We identified FVC string patterns and extracted numeric values in proximity to these strings. Subsequently, we performed several processing steps to account for variability in note type and structure, related PFT output (e.g. FEV1/FVC ratio), and values copied across multiple notes. Dates were assigned to FVC values by cross referencing EHR note dates with the most recent date accompanying CPT codes for PFTs. FVC values derived by the NLP program were compared to observed FVC values recorded directly from PFT equipment and available as structured data in the VA Corporate Data Warehouse (CDW). These represent a subset of all PFTs completed in the VA due to PFT compatibility.

Results: We developed and tested the NLP program in a cohort of 7,485 patients with RA-ILD. In the VA CDW, there were 6,002 FVC values for 1,843 unique patients. The NLP program increased the yield of FVC values by >2.6-fold, extracting 15,983 FVC values for 4,849 patients. Among 3,037 date matched FVC values from NLP and CDW, mean (SD) FVC was 3.0 (0.9) L from both sources, and 80% of NLP values were within 0.1L of CDW values. The mean difference in FVC between NLP and CDW values was 0.03L with no systematic bias in NLP derived FVC values seen on Bland-Altman plot (Figure 1). NLP and CDW FVC values strongly correlated (Figure 2, r=0.89, p< 0.001). A total of 3,325 RA-ILD patients had at least two FVC values captured by NLP. Comparing the first and last FVC values, there was a mean decline in FVC of -0.30 (SD 0.86) L over a mean follow-up of 5.2 (SD 4.6) years (p< 0.001 by paired t-test) (Table 1). Similar changes in FVC (mean -0.29 [SD 0.66] L) were observed using CDW data, but this data source captured fewer RA-ILD patients (n=1,181).

Conclusion: NLP of EHR notes substantially increases the capture of longitudinal FVC values among patients with RA-ILD. These values are highly accurate compared to the gold standard of direct output from PFT equipment which may not always be available in structured format. Use of this NLP program can facilitate clinical and epidemiologic research studies in RA-ILD by capturing longitudinal changes among one of the most critical outcomes measures in RA-ILD.

Figure 1. Bland-Altman plot comparing absolute differences in FVC values with mean FVC values.
Abbreviations: CDW, Corporate Data Warehouse; FVC, forced vital capacity; NLP, natural language processing; SD, standard deviation

Figure 2. Scatter plot demonstrating the correlation of forced vital capacity values obtained from the natural language processing program (NLP) to values obtained from pulmonary function test equipment (VA CDW).
Abbreviations: CDW, Corporate Data Warehouse; FVC, forced vital capacity; NLP, natural language processing


Disclosures: P. Roul, None; Y. Yang, None; D. Hershberger, None; T. Mikuls, Gilead Sciences, 2, Horizon, 2, 5, Pfizer Inc, 2, Sanofi, 2, Bristol-Myers Squibb, 2; J. Rojas, None; J. Curtis, AbbVie, 2, Amgen, 2, 5, Bristol-Myers Squibb, 2, Janssen, 2, Eli Lilly, 2, Myriad, 2, Pfizer Inc, 2, 5, Roche/Genentech, 2, UCB, 2, CorEvitas, 2, 5, Crescendo Bio, 5; J. Baker, Bristol-Myers Squib, 2, Pfizer, 2; B. Sauer, None; B. England, Boehringer-Ingelheim, 2.

To cite this abstract in AMA style:

Roul P, Yang Y, Hershberger D, Mikuls T, Rojas J, Curtis J, Baker J, Sauer B, England B. Natural Language Processing of Electronic Health Record Notes Captures Forced Vital Capacity in Rheumatoid Arthritis-Associated Interstitial Lung Disease [abstract]. Arthritis Rheumatol. 2021; 73 (suppl 9). https://acrabstracts.org/abstract/natural-language-processing-of-electronic-health-record-notes-captures-forced-vital-capacity-in-rheumatoid-arthritis-associated-interstitial-lung-disease/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2021

ACR Meeting Abstracts - https://acrabstracts.org/abstract/natural-language-processing-of-electronic-health-record-notes-captures-forced-vital-capacity-in-rheumatoid-arthritis-associated-interstitial-lung-disease/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM ET on November 14, 2024. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology