ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2025
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • 2020-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 1310

Naturalized Language Processing Based Extraction of Rheumatoid Arthritis Disease Activity Measures from the Electronic Health Record

Elizabeth Park1, Iram Kamdar2, Reid Weisberg1, Andy Nguyen1, Joan Bathon3, Jon Giles4, Chunhua Weng5 and Elana Bernstein1, 1Columbia University Irving Medical Center, New York, NY, 2Columbia University Data Science Institute, New York, NY, 3Columbia University, NEW YORK, NY, 4Cedars-Sinai Medical Center, New York, NY, 5Columbia University Department of Biomedical Informatics, New York, NY

Meeting: ACR Convergence 2025

Keywords: Disease Activity, rheumatoid arthritis

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Monday, October 27, 2025

Title: (1306–1346) Rheumatoid Arthritis – Diagnosis, Manifestations, and Outcomes Poster II

Session Type: Poster Session B

Session Time: 10:30AM-12:30PM

Background/Purpose: A treat-to-target approach in rheumatoid arthritis (RA) requires intense monitoring of RA disease activity with measures such as the clinical disease activity index (CDAI). Outside of clinical studies, CDAI may be sparsely recorded by clinicians in real world settings due to their local electronic health record (EHR) systems lacking systematic integration of structured entry forms for CDAI (e.g., homunculus) and variability in documentation practice among clinicians. We hypothesize that most clinicians document CDAI in unstructured free text. Two previous publications developed successful natural language processing (NLP) pipelines extracting CDAI, but the practices were limited to community settings and the Veterans Administration. We explored extracting CDAIs from a large, tertiary academic center EHR with a heterogeneous RA patient population utilizing novel NLP techniques.

Methods: The New York Presbyterian/Columbia University Medical Center Clinical Data Warehouse (NYP/CUMC CDW) contains over 4.5 million patients’ data stored in structured and unstructured formats. From a pre-selected group of RA patients in the CDW, a random sample (20%) of free text notes recorded in rheumatology outpatient practices across one EHR system (EPIC) between 2020-2024 were produced. A list of CDAI, its key components, as well as plausible variations in phrasing was generated with expert curation (EP) (Table 1). These keywords, along with serostatus (seropositivity) were extracted using an automated information extraction pipeline, engineered through large language model (LLM) prompts input through Chat GPT-4 Education, a HIPAA compliant, CUMC approved resource (Figure 1).

Results: A total of 1,983 unique RA patients (one note per patient closest to July 1, 2024) were analyzed; 768 (38.7%) were seropositive, 562 (28.3%) were seronegative, and 649 (32.7%) did not have serostatus recorded. The term “CDAI” and its variations were extracted from 173 (8.72%) patients with a median value of 7 (range 0-43). Of the 173 with CDAI recorded, 59 (34.1%) were in remission, 51 (29.4%) had low activity, 31 (17.9%) had moderate activity, and 31 (17.9%) had high activity. Of those with CDAI recorded, 137 (79.1%) were seropositive and 35 (20.2%) were seronegative (Table 2). An expert (EP) performed a chart review of a random 20% sample of the extracted CDAI scores, resulting in a precision of 0.97, recall of 0.97, and F1 score of 0.97.

Conclusion: We demonstrated feasibility and accuracy of a Chat GPT-4 supported NLP/LLM pipeline to extract CDAI scores from a large, academic EHR. At our institution, CDAI appears to be sparsely documented (< 10%) in the sampled portion of notes. Our next steps include: 1) Refining the RA cohort by chart validating those without serostatus (i.e., confirming if they are actual RA cases); 2) Integrating analysis of historical EHRs used prior to EPIC (before 2020) to perform longitudinal extraction of CDAI scores; and 3) Exploring portability of this pipeline to other academic institutions, with the goal of external validation.

Supporting image 1CDAI/Serostatus Terminology Extraction

Supporting image 2Summarized CDAI Extraction

Supporting image 3Automated Extraction Pipeline


Disclosures: E. Park: Amgen, 2, Boehringer Ingelheim, 2, Synthekine, 2; I. Kamdar: None; R. Weisberg: None; A. Nguyen: None; J. Bathon: AbbVie/Abbott, 2, Merck, 2, Ono Pharma, 2; J. Giles: None; C. Weng: None; E. Bernstein: AstraZeneca, 5, aTYR, 5, Boehringer-Ingelheim, 2, 5, Bristol-Myers Squibb(BMS), 5, Cabaletta Bio, 5, Synthekine, 2.

To cite this abstract in AMA style:

Park E, Kamdar I, Weisberg R, Nguyen A, Bathon J, Giles J, Weng C, Bernstein E. Naturalized Language Processing Based Extraction of Rheumatoid Arthritis Disease Activity Measures from the Electronic Health Record [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/naturalized-language-processing-based-extraction-of-rheumatoid-arthritis-disease-activity-measures-from-the-electronic-health-record/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/naturalized-language-processing-based-extraction-of-rheumatoid-arthritis-disease-activity-measures-from-the-electronic-health-record/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

Embargo Policy

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM CT on October 25. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology