ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2025
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • 2020-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 0826

Impact of Large Language Models on Diagnostic Reasoning of Medical Students in Rheumatology: A Randomized Trial

Anna Roemer1, Nadine Schlicker2, Anna Kernder3, Benedikt Albe1, Juliana Hack4, Martin Hirsch5, Sebastian Kuhn1 and Johannes Knitza6, 1Institute for Digital Medicine, University Hospital of Giessen and Marburg, Philipps University Marburg, Marburg, Germany, 2Institute for Artificial Intelligence in Medicine, University Hospital Giessen and Marburg, Marburg, Germany, 3Department of Rheumatology, Rheumazentrum Ruhrgebiet, Herne, Germany, 4Center for Orthopaedics and Trauma Surgery, University Hospital Giessen and Marburg, Marburg, Germany, 5Institute for Artificial Intelligence in Medicine, University Hospital Giessen and Marburg, Philipps University Marburg, Marburg, Germany, 6Institute for Digital Medicine, University Hospital Gießen-Marburg, Philipps University, Marburg, Germany

Meeting: ACR Convergence 2025

Keywords: Education, Health Services Research, practice guidelines, quality of care, Randomized Trial

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Sunday, October 26, 2025

Title: Abstracts: Professional Education (0825–0830)

Session Type: Abstract Session

Session Time: 3:15PM-3:30PM

Background/Purpose: Although not certified as medical devices, Large Language Models (LLMs) such as ChatGPT-4 provide rapid support in diagnostic reasoning and may facilitate scalable upskilling of healthcare professionals and laypersons alike. This trial evaluated the impact of LLM assistance on the diagnostic reasoning of medical students solving rheumatology case vignettes.

Methods: In this RCT (NCT06748170), 68 medical students were allocated between January 7 and March 30, 2025, to either an intervention group (IG), with access to ChatGPT-4o, alongside conventional diagnostic resources, or a control group (CG), with access to conventional resources only. Participants in the IG first generated diagnostic suggestions using conventional resources and then revised their suggestions after consulting the LLM. All participants completed 3 rheumatology case vignettes—GPA , RA and SLE—originally published in the ACR’s online learning center. Each vignette required a top diagnosis, and optionally up to five diagnostic suggestions in total. The suggested diagnoses were independently and blindly reviewed by two board-certified rheumatologists. Diagnostic accuracy was additionally scored by awarding two points for correct diagnoses and one point for plausible alternatives, generating a cumulative diagnostic score. Time to case completion and diagnostic confidence (0–10) were also recorded.

Results: The mean (SD) age was 24.8 (2.6) years; 62% (42/68) of participants were female. Prior use of LLMs was reported by 96% (65/68) of students. Interrater agreement was substantial (Cohen’s κ = 0.79). Students in the IG identified the correct top diagnosis significantly more often than those in the CG (77.5%, mean 2.3/3 [SD 0.8] vs. 32.4%, mean 1.0/3 [SD 0.8]; independent t-test = 7.3, P < 0.001) and were also more likely to include a correct diagnosis among their top five suggestions (91.2%, mean 2.7/3 [SD 0.5] vs. 47.1%, mean 1.4/3 [SD 0.7]; independent t-test = 8.5, P < 0.001), see Figure 1. The standalone performance of the LLM exceeded that of students using conventional resources, listing the correct diagnosis first in 71.6% of cases and within the top five in 72.5%. Median diagnostic scores per case were 4 (IQR, 3–5) in the IG, 2 (IQR, 1–3) in the CG, and 5 (IQR, 3.3–6) for the LLM. Median time to case completion was 498 seconds (IQR, 371–609) in the IG and 253 seconds (IQR, 175–395) in the CG. LLM use significantly increased the proportion of correct top diagnoses in the IG, from 46.1% (mean 1.4/3 [SD 0.7]) to 77.5% (mean 2.3/3 [SD 0.8]); paired t-test = 7.1, P < 0.001, see Figure 2. Diagnostic confidence in the IG also increased significantly following LLM use (mean 5.2/10 [SD 1.5] to 7.0/10 [SD 1.3]; paired t-test = –9.4, P < 0.001). Among IG participants, 97% (33/34) reported they would use the LLM again for diagnostic support, and 91% (31/34) found it easy to use.

Conclusion: To our knowledge, this is the first RCT to evaluate the diagnostic assistive potential of LLMs in rheumatology. Providing medical students with access to an LLM significantly improved diagnostic accuracy compared to conventional resources alone. Further research is warranted to determine how LLMs can be implemented to safely empower healthcare professionals as well as patients.

Supporting image 1Proportion of correct diagnoses ranked first or within the top five suggestions

Supporting image 2Distribution of top diagnosis categories in the intervention group across study phases


Disclosures: A. Roemer: None; N. Schlicker: None; A. Kernder: None; B. Albe: None; J. Hack: None; M. Hirsch: None; S. Kuhn: None; J. Knitza: GAIA, 2, Vila Health, 12,, 2.

To cite this abstract in AMA style:

Roemer A, Schlicker N, Kernder A, Albe B, Hack J, Hirsch M, Kuhn S, Knitza J. Impact of Large Language Models on Diagnostic Reasoning of Medical Students in Rheumatology: A Randomized Trial [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/impact-of-large-language-models-on-diagnostic-reasoning-of-medical-students-in-rheumatology-a-randomized-trial/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/impact-of-large-language-models-on-diagnostic-reasoning-of-medical-students-in-rheumatology-a-randomized-trial/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

Embargo Policy

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM CT on October 25. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology