ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2025
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • 2020-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 0461

Right Diagnoses For the Wrong Reasons: Limitations of Current Large-Language Model based Agentic Frameworks for Screening of Rheumatoid Arthritis

Sarthak Verma1, Avarna Agarwal1, Umakanta Maharana2, Murari Mandal3, Prasanta Padhan1, Prakashini Mruthyanjaya4 and Sakir Ahmed1, 1Department of Clinical Immunology and Rheumatology, Kalinga Institute of Medical Sciences, Bhubaneswar, Orissa, India, 2School of Computer Engineering, KIIT University, Bhubaneswar, India, 3School of Computer Engineering, KIIT University, Bhubaneswar, Orissa, India, 4Department of Clinical Immunology and Rheumatology, Kalinga Institute of Medical Sciences, Bhubaneswar, India

Meeting: ACR Convergence 2025

Keywords: rheumatoid arthritis

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Sunday, October 26, 2025

Title: (0430–0469) Rheumatoid Arthritis – Diagnosis, Manifestations, and Outcomes Poster I

Session Type: Poster Session A

Session Time: 10:30AM-12:30PM

Background/Purpose: In India, Rheumatoid arthritis (RA) is often diagnosed late. Artificial Intelligence (AI) driven Large Language models (LLMs) can be deployed via widely available mobile phone devices, offering a potential screening solution. Here, we attempted to explore various AI agentic frameworks to identify the most effective model SARA (Screening Agent for RA), and also to validate how these models arrive at a diagnosis of RA.

Methods: We had developed the PreRAID (Pre-Screening Rheumatoid Arthritis Information Database) from consenting patients with joint pain, classifying patients into RA or not RA as per treating physician diagnosis. The dataset was split into 280 cases for the knowledge base (KB) and 70 for testing. A Neo4j vector database enabled embedding-based retrieval.Six LLMs were evaluated—four closed-source (OpenAI o1, OpenAI o3 mini, Gemini 2.5 Pro, Gemini 2.0 Flash) and two open-source (QwQ, Deepseek R1 70B). Each model was tested under three agentic configurations: (1) a single agent without knowledge base (KB) access; (2) a single agent with retrieval-augmented generation (RAG); and (3) a dual-agent setup, where the first agent generated a diagnosis and reasoning, validated by a second agent. [Figure 1]Each model-configuration pair was evaluated on 50 new cases. The models were also asked to explain the reasons behind each diagnosis, which was independently assessed by two fellows and one consultant rheumatologist using a four-point Likert scale.

Results: The PreRAID dataset included 84% RA-confirmed cases and 16% controls. Deepseek R1 showed the highest accuracy (82%) in the single-agent with KB setting, followed by o1 and o3 mini (80% each). Accuracy dropped in the two-agent setup, most notably in Gemini 2.5 Pro (37%) and Gemini 2.0 Flash (40%) [Figure 2].Despite moderate to high accuracy, Gemini models underperformed in reasoning when agentic complexity increased. Reasoning quality was suboptimal across all models. Gemini 2.0 Flash (36/50) and Deepseek R1 (28/50) had the most correct justifications, while QwQ and Gemini 2.5 Pro scored the lowest (6 and 10, respectively). Many outputs showed minor or major reasoning flaws, indicating a disconnect between diagnostic accuracy and reasoning integrity.

Conclusion: Deepseek R1 achieved the highest diagnostic accuracy (82%) in the single-agent with KB configuration. However, reasoning quality across all models was suboptimal, with many correct diagnoses attributed to flawed reasons. The dual-agent setup, intended to enhance reasoning, led to decreased diagnostic performance. Though we built a robust system (SARA) for screening, the lack of proper reasoning by the LLMs preclude the use as a diagnostic tool. Future directions should prioritize explainable AI approaches so that clinicians can trust the tool.

Supporting image 1Figure 1: Agentic framework. (A) Workflow of the SARA framework and (B) Illustration of the sequential steps involved in diagnosing RA with single agent without knowledge base (KB), single agent with KB and Two agents with KB

Supporting image 2Figure 2: Validation of the different models. (A) Accuracies of different LLM models under different agentic configurations and (B) Accuracies of reasonings provided by different models as validated by clinicians


Disclosures: S. Verma: None; A. Agarwal: None; U. Maharana: None; M. Mandal: None; P. Padhan: None; P. Mruthyanjaya: None; S. Ahmed: Alkem, 6, Cipla, 6, Dr Reddy's, 6, Ipca, 6, Janssen, 1, Pfizer, 6, Sun Pharma, 6, Torrent, 6.

To cite this abstract in AMA style:

Verma S, Agarwal A, Maharana U, Mandal M, Padhan P, Mruthyanjaya P, Ahmed S. Right Diagnoses For the Wrong Reasons: Limitations of Current Large-Language Model based Agentic Frameworks for Screening of Rheumatoid Arthritis [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/right-diagnoses-for-the-wrong-reasons-limitations-of-current-large-language-model-based-agentic-frameworks-for-screening-of-rheumatoid-arthritis/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/right-diagnoses-for-the-wrong-reasons-limitations-of-current-large-language-model-based-agentic-frameworks-for-screening-of-rheumatoid-arthritis/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

Embargo Policy

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM CT on October 25. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology