ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2025
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • 2020-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 0469

Evaluating Large Language Models for Automated Joint Involvement Analysis in Rheumatoid Arthritis

Xingyi Liu1, Sunghwan Sohn1 and Cynthia Crowson2, 1Mayo Clinic, Rochester, MN, 2Mayo Clinic, Stewartvillle, MN

Meeting: ACR Convergence 2025

Keywords: Diagnostic criteria, rheumatoid arthritis

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Sunday, October 26, 2025

Title: (0430–0469) Rheumatoid Arthritis – Diagnosis, Manifestations, and Outcomes Poster I

Session Type: Poster Session A

Session Time: 10:30AM-12:30PM

Background/Purpose: In rheumatoid arthritis (RA), delaying initiation of treatment for 12 weeks or longer may lead to permanent joint damage and make remission harder to achieve. Therefore, it’s essential to develop strategies that improve early detection of RA in primary care setting. Joint involvement (“synovitis”) is a core domain of the 2010 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) classification criteria for RA. Manual extraction of joint counts from unstructured clinical notes is time-consuming and labor-intensive. We evaluated Llama 3.1—a state-of-the-art large language model (LLM)—for automated identification, quantification, and dating of swollen/tender joints documented in electronic health record (EHR) notes.

Methods: We selected 200 patients meeting RA criteria by manual chart review and 200 non-RA controls. All clinic notes from 1990-01-01 to 2024-05-22 were processed by LLM prompted to act as an expert rheumatologist and provided with the exact same information used during manual chart review—including definitions of large and small joints and the exclusion criteria for certain joints—to extract counts of swollen/tender large and small joints. These counts were then mapped to the five joint involvement levels defined by the 2010 ACR/EULAR criteria (1 large joint = 0 point; 2–10 large joints = 1 point; 1–3 small joints = 2 points; 4–10 small joints = 3 points; >10 joints with at least one small joint = 5 points). Model-derived results were compared to the manual gold standard: we computed true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), accuracy, precision, recall, and F1-score for each level. For cases correctly identified (TP), we calculated the distribution of days between the LLM’s first detected date and the manual review date of joint involvement.

Results: LLM achieved level-specific accuracies ranging from 0.66 (4-10 small joints) to 0.78 (1-3 small joints), with F1-scores spanning 0.32 (11+ joints) to 0.87 (1–3 small joints) (Table).When we compared the LLM’s first detected date and the manual review date among true positives, the median absolute difference was zero days for all levels except the 11+ joint level (0.5 days). For the 25th percentile, LLM’s first detected date was 187.75 days, 23.5 days and 152 days earlier than the manual review date for 1 large joint, 2-10 large joints and 1-3 small joints, respectively, detection timing was identical for 4–10 small joints and 11+ joints. For the 75th percentile, it trailed manual review by 8.5 days for 1 large joint, 14 days for 1–3 small joints, 57 days for 4–10 small joints, and 366.5 days for 11+ joints, with 2–10 large joints detected at the same date (Table).

Conclusion: LLMs demonstrate potential to extract and classify joint involvement levels from unstructured EHR notes, matching manual review in both classification performance and temporal accuracy. These results support the promise of LLMs, especially with continued refinement, to accelerate and automate RA phenotyping.

Supporting image 1Table. Performance Metrics of LLM for Joint Involvement Analysis.


Disclosures: X. Liu: None; S. Sohn: None; C. Crowson: None.

To cite this abstract in AMA style:

Liu X, Sohn S, Crowson C. Evaluating Large Language Models for Automated Joint Involvement Analysis in Rheumatoid Arthritis [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/evaluating-large-language-models-for-automated-joint-involvement-analysis-in-rheumatoid-arthritis/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/evaluating-large-language-models-for-automated-joint-involvement-analysis-in-rheumatoid-arthritis/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

Embargo Policy

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM CT on October 25. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology