ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • ACR Convergence 2020
    • 2020 ACR/ARP PRSYM
    • 2019 ACR/ARP Annual Meeting
    • 2018-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 2070

Augmenting Medical Education: An Evaluation of GPT-4 and ChatGPT in Answering Rheumatology Questions from the Spanish Medical Licensing Examination

Alfredo Madrid García1, zulema Rosales2, Dalifer Freites2, Inés Pérez Sancristóbal3, Benjamin Fernandez3 and Luis Rodríguez Rodríguez3, 1Fundación Investigación Biomédica Hospital Clínico San Carlos, Madrid, Spain, 2Hospital Clínico San Carlos, Madrid, Spain, 3Hospital Clinico San Carlos, Madrid, Spain

Meeting: ACR Convergence 2023

Keywords: Education, education, patient, informatics

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Tuesday, November 14, 2023

Title: (2061–2088) Professional Education Poster

Session Type: Poster Session C

Session Time: 9:00AM-11:00AM

Background/Purpose: The emergence of Large Language Models (LLM) with remarkable performance such as GPT-4 and ChatGPT, has led to an unprecedented uptake in the population. One of the most promising and studied applications for these models concerns education. Their ability to understand and generate human-like text creates a multitude of opportunities for enhancing educational practices and outcomes. The objectives of this study were to assess the success rate of GPT-4/ChatGPT in answering rheumatology questions from the access exam to specialized medical training in Spain (MIR), and to evaluate the medical reasoning followed by ChatGPT/GPT-4.

Methods: Rheumatology questions from the MIR exam published from 2010 onwards were selected and used as prompts. Questions present clinical cases or factual queries, followed by four to five responses with a single correct answer. Questions containing images were excluded.

Official responses were compared with those provided by GPT-4/ChatGPT to estimate the chatbots accuracy. Differences between LLM were evaluated using McNemar’s test.

A rheumatologist with teaching experience in preparing students for the MIR exam assessed the correctness of the medical reasoning of the LLM given for each of the questions, using a 5-point Likert scale, where 5 indicates that the reasoning was entirely correct. The influence in the medical reasoning score of the chatbot model used was analyzed using McNemar’s test. The influence of the year of the exam question, type of question (clinical case vs. factual), patient’s gender in the clinical case questions, and pathology addressed were assessed using logistic regression models, after dichotomizing the Likert scale into two groups: low score (i.e.,1,2,3) vs high score (i.e., 4, 5).

Results: After applying the inclusion criteria, 106 questions remained, including 36 (34%) factual queries and 70 (66%) clinical case questions. 99 (93.4%) and 69 (65.1%) questions were correctly answered by GPT-4 and ChatGPT, respectively. Most of the questions that GPT-4 failed were factual queries (5/7, 71.4%). We observed a significantly higher proportion of correct answers provided by GPT-4 (p=1.2×10-7). Moreover, the clinical reasoning score of this LLM was also higher (p=4.7×10-5). Regarding the other variables, there were no statistically significant differences in the clinical reasoning score for GPT-4/ChatGPT across the various categories studied. Figures 1, 2 and 3 show the scores given to each question, considering the year, question type and disease.

Conclusion: GPT-4 showed a significantly higher accuracy and clinical reasoning score than ChatGPT.

No significant differences in the clinical reasoning correctness were observed regarding the type of question or the condition evaluated.

GPT-4 could become a valuable asset in medical education, although its precise role still has to be defined.

Supporting image 1

GPT_4 medical reasoning score of the questions according to the year

Supporting image 2

GPT_4 medical reasoning score of the questions according to its type

Supporting image 3

GPT_4 medical reasoning score of the questions according to the disease


Disclosures: A. Madrid García: None; z. Rosales: None; D. Freites: None; I. Pérez Sancristóbal: None; B. Fernandez: None; L. Rodríguez Rodríguez: None.

To cite this abstract in AMA style:

Madrid García A, Rosales z, Freites D, Pérez Sancristóbal I, Fernandez B, Rodríguez Rodríguez L. Augmenting Medical Education: An Evaluation of GPT-4 and ChatGPT in Answering Rheumatology Questions from the Spanish Medical Licensing Examination [abstract]. Arthritis Rheumatol. 2023; 75 (suppl 9). https://acrabstracts.org/abstract/augmenting-medical-education-an-evaluation-of-gpt-4-and-chatgpt-in-answering-rheumatology-questions-from-the-spanish-medical-licensing-examination/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2023

ACR Meeting Abstracts - https://acrabstracts.org/abstract/augmenting-medical-education-an-evaluation-of-gpt-4-and-chatgpt-in-answering-rheumatology-questions-from-the-spanish-medical-licensing-examination/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM ET on November 14, 2024. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology