ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2025
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • 2020-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 0530

Evaluating ChatGPT’s Performance in Diagnosing Low Back Pain: A Comparison with Clinicians and Impact of Prompted Specialties

Annika Nack1, Xabier Michelena Vegas2, Pol Maymó-Paituvi3, Cristina Calomarde-Gómez4, david lobo5, Asier García-Alija6, Raquel Ugena-García4, Maria Aparicio1, Paola Vidal Montal7 and Diego Benavent8, 1Hospital Germans Trias i Pujol, Barcelona, Spain, 2Hospital Universitari Vall Hebron, Barcelona, Spain, 3Hospital Universitari de Bellvitge, L'Hospitalet de Llobregat, Spain, 4Hospital Universitari Germans Trias i Pujol, Badalona, Spain, 5Doctor Josep Trueta University Hospital, Girona, Catalonia, Spain, 6Sant Pau Hospital, Barcelona, Catalonia, Spain, 7Bellvitge University Hospital, Barcelona, Spain, 8Hospital Universitari de Bellvitge, Madrid, Spain

Meeting: ACR Convergence 2025

Keywords: Back pain, Diagnostic criteria, pain, spondyloarthritis, Spondyloarthropathies

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Sunday, October 26, 2025

Title: (0522–0553) Spondyloarthritis Including Psoriatic Arthritis – Diagnosis, Manifestations, & Outcomes Poster I

Session Type: Poster Session A

Session Time: 10:30AM-12:30PM

Background/Purpose: Low back pain (LBP) is a multifactorial condition managed by various specialists. AI chatbots like ChatGPT may help clinicians identify probable diagnoses. Given that query phrasing can influence outputs, we hypothesized that ChatGPT’s responses may vary depending on the specialty stated in the prompt.We aimed to evaluate whether ChatGPT’s diagnostic output changes when simulating different specialties in LBP assessment, and to compare its diagnostic accuracy to that of clinicians.

Methods: A total of 10 clinical cases related to LBP were included from official public exams for rheumatologists in Spain, designed to assess expertise for permanent specialist positions. These included 5 cases of rheumatologic diseases and 5 representing other causes. The exercise was conducted in December 2024 using ChatGPT version 4.o. Ten clinicians with at least 5 years of experience in managing rheumatic and musculoskeletal diseases (RMDs) participated in the study. Each question was answered independently by clinicians, and at a later stage by each participant asking ChatGPT, simulating five specialties (Rheumatology, Neurology, Internal Medicine, Rehabilitation, and Orthopedics). The gold standard was the official answer listed as diagnosis for each exam question. Diagnostic performance was evaluated using precision (percentage of cases where the top diagnosis matched the gold standard) and sensitivity (percentage of cases where the gold standard was included in the top three probable diagnoses). The time taken to answer all 10 clinical cases was recorded for both clinicians alone and using ChatGPT, starting when the case was reviewed and stopping when three differential diagnoses and the most probable diagnosis were finalized.

Results: In total, 528 free-text diagnoses were generated and standardized into 39 diagnostic categories. The percentage of the correct score for each participant and each prompted specialty is illustrated in Figure 1. Median precision ranged from 70% to 80% across the five specialties simulated by ChatGPT, and median sensitivity ranged from 80% to 90%. Statistical analysis revealed no significant differences in precision (p = 0.80) or sensitivity (p = 0.68) between the specialties simulated by ChatGPT, indicating consistent performance regardless of the prompted specialty. For clinicians, the median precision was 60%, and median sensitivity was 80%. When comparing ChatGPT to clinicians, ChatGPT had significantly higher diagnostic precision (median = 75% vs. 60%, p < 0.001) and significantly higher sensitivity (median = 85% vs. 80%, p = 0.02). The mean time taken by participants to complete the task was 12.35±5.62 minutes, compared to 2.33±0.03 minutes for ChatGPT (p< 0.01).

Conclusion: ChatGPT provides consistent diagnostic performance across simulated specialties, unaffected by the prompt’s semantic framing. It may outperform clinicians in both diagnostic precision and sensitivity, highlighting its potential as a valuable complementary tool for generating fast, accurate and comprehensive differential diagnoses in cases of low back pain. Further research is needed to explore its application in clinical practice and its ability to enhance diagnostic workflows.

Supporting image 1Figure 1. Percentage of the correct score for each participant and each prompted speciality


Disclosures: A. Nack: None; X. Michelena Vegas: None; P. Maymó-Paituvi: None; C. Calomarde-Gómez: None; d. lobo: None; A. García-Alija: None; R. Ugena-García: None; M. Aparicio: None; P. Vidal Montal: None; D. Benavent: AbbVie/Abbott, 2, 6, Eli Lilly, 6, Janssen, 6, Novartis, 5, 6, Pfizer, 6, Savana, 7, UCB, 2, 6.

To cite this abstract in AMA style:

Nack A, Michelena Vegas X, Maymó-Paituvi P, Calomarde-Gómez C, lobo d, García-Alija A, Ugena-García R, Aparicio M, Vidal Montal P, Benavent D. Evaluating ChatGPT’s Performance in Diagnosing Low Back Pain: A Comparison with Clinicians and Impact of Prompted Specialties [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/evaluating-chatgpts-performance-in-diagnosing-low-back-pain-a-comparison-with-clinicians-and-impact-of-prompted-specialties/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/evaluating-chatgpts-performance-in-diagnosing-low-back-pain-a-comparison-with-clinicians-and-impact-of-prompted-specialties/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

Embargo Policy

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM CT on October 25. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology