ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • ACR Convergence 2020
    • 2020 ACR/ARP PRSYM
    • 2019 ACR/ARP Annual Meeting
    • 2018-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 1306

A Comparative Analysis of ChatGPT-4, Microsoft’s Bing and Google’s Bard at Answering Rheumatology Clinical Questions

Pitchaporn Yingchoncharoen1, Nattanicha Chaisrimaneepan2, watsachon Pangkanon1 and Jerapas Thongpiya1, 1Texas Tech Health Sciences Center, Lubbock, TX, 2Texas Tech University Health Sciences Center, Lubbock, TX

Meeting: ACR Convergence 2024

Keywords: comparative effectiveness, Decision analysis, education, medical

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Sunday, November 17, 2024

Title: Professional Education Poster

Session Type: Poster Session B

Session Time: 10:30AM-12:30PM

Background/Purpose: The role of artificial intelligence (AI) is rapidly expanding in medical fields, with an increased frequency of use by both patients and providers. There is a paucity of data assessing the ability of AI chat models (also known as Large Language Models or LLMs) to accurately answer clinical questions and compare the accuracy of responses provided by different LLMs. This study aimed to investigate the ability of three LLMs (ChatGPT-4, Bard, and Bing) to respond accurately and with quality responses provided by different LLMs.

Methods: Rheumatology-related questions from the rheumatology section of the MKSAP-19 were entered into three LLMs (OpenAI’s ChatGPT-4, Microsoft’s Bing, and Google’s Bard) between April 26 to May 3, 2024. Questions were posted twice: once with the inclusion of multiple-choice answers to choose from and once without (posed as “answer this question”). The inclusion of laboratory values, imaging findings, and pathology findings within the question stem was noted. Generated responses were assessed based on overall accuracy and the number of correct answers, marked as correct, incorrect, or unable to answer. Statistical analysis was performed to determine the number of correct responses per LLM per category. Subgroup analysis was conducted based on question type. Data were reported as a proportion of correct answers per LLM and per question category in percentages (%); the threshold for significance was set at p=< 0.05.

Results: A total of 60 questions were used to query the AI models, including 21 items of diagnosis questions, 3 items of epidemiology/risk factors questions, 11 items of investigations questions, 16 items of management questions, and 9 items of integrating diagnosis and management questions. The percentage of questions requiring laboratory interpretation was 53.33%, and 15% of questions included imaging findings. When questions were posed alongside multiple-choice answers, ChatGPT-4’s accuracy was 54 (90%), Bing’s accuracy was 39 (65%), and Bard’s accuracy was 32 (53.33%) (p < 0.001). When prompts were input into the LLMs in an open-ended fashion, ChatGPT-4 correctly answered 56 (93.34%), Bing correctly answered 37 (61.67%), and Bard correctly answered 24 (40%) (p < 0.001). ChatGPT-4 and Bing attempted to answer 100% of the questions, whereas Bard was unable to answer multiple-choice and open-ended questions, 21.67% and 38.33%, respectively. All three LLMs provided a rationale in addition to an answer, as well as counseling where appropriate.

Conclusion: LLMs demonstrate a promising ability to answer clinical questions in rheumatology, especially with ChatGPT-4. However, LLMs are dynamic entities whose accuracy will change over time requiring validation, and some LLMs showed limitations in accuracy and should be used with caution by healthcare providers, medical trainees, and patients.

Supporting image 1

Table 1. Accuracy of provided answers for multiple choice questions for different large language models

Supporting image 2

Table 2. Accuracy of provided answers for open-ended questions for different large language models


Disclosures: P. Yingchoncharoen: None; N. Chaisrimaneepan: None; w. Pangkanon: None; J. Thongpiya: None.

To cite this abstract in AMA style:

Yingchoncharoen P, Chaisrimaneepan N, Pangkanon w, Thongpiya J. A Comparative Analysis of ChatGPT-4, Microsoft’s Bing and Google’s Bard at Answering Rheumatology Clinical Questions [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/a-comparative-analysis-of-chatgpt-4-microsofts-bing-and-googles-bard-at-answering-rheumatology-clinical-questions/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2024

ACR Meeting Abstracts - https://acrabstracts.org/abstract/a-comparative-analysis-of-chatgpt-4-microsofts-bing-and-googles-bard-at-answering-rheumatology-clinical-questions/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM ET on November 14, 2024. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology