ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2025
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • 2020-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 0828

Two Common AI Models Create Poor Rheumatology Board Style Questions

Catherine Deffendall1, Narender Annapureddy2, Kevin Byram2, Erin Chew2 and Tyler Reese3, 1Vanderbilt University, Nashville, TN, 2Vanderbilt University Medical Center, Nashville, TN, 3Vanderbilt University Medical Center, Madison, TN

Meeting: ACR Convergence 2025

Keywords: education, medical

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Sunday, October 26, 2025

Title: Abstracts: Professional Education (0825–0830)

Session Type: Abstract Session

Session Time: 3:45PM-4:00PM

Background/Purpose: Rheumatology fellows frequently prepare for board examinations using case-based, multiple-choice questions. However, there are few resources with enough questions to prepare and those currently available can be costly. In this pilot study, we aim to evaluate if artificial intelligence (AI) generated questions are suitable for use.

Methods: We provided 5 identical prompts to the free versions of both Gemini and ChatGPT1,2. Each prompt requested one board style question using an uploaded PDF of ACR guidelines including glucocorticoid induced osteoporosis, vaccination, interstitial lung disease, rheumatoid arthritis and perioperative management. The following is an example of one such prompt, “Using the 2023 American College of Rheumatology clinical practice guidelines for Interstitial Lung Disease create one rheumatology board style question using reasonable alternatives as detractors”. AI generated questions were arranged in two separate forms labeled Form 1 (Gemini) and Form 2 (ChatGPT) to blind the reviewer to which AI model created it. Ten rating metrics covering both accuracy and technical aspects of question writing were developed by an experienced question writer. Rating metrics included answer choices on a 5-point likert scale (ranging from strongly disagree to strongly agree) for 8 questions, yes/no choices for if the question was negatively worded, a 10-point scale for overall acceptability of the question and whether the Gemini or ChatGPT question was preferred. Three of the authors, who are board-certified rheumatologists engaged in medical education, independently rated questions on paper surveys. For each of the AI generated questions, the average rating metric of all three interviewers was calculated. This question average was used to calculate overall averages for each AI model. Assuming nonparametric data, Mann Whitney U was used to compare averages between models.

Results: Gemini and ChatGPT created 5 questions each with 4-5 answer choices per question. Attending rheumatologists rated these questions lowest on whether or not the question was factually correct (Combined 3.53 vs Gemini 3.72 vs ChatGPT 3.34) and if the wrong answer choices represented commonly mistaken choices in clinical practice (Combined 3.87 vs Gemini 3.86 vs ChatGPT 3.88). Technical aspects of question writing were the highest rated including if the question was negatively worded (100% No), and if the question was grammatically correct (Combined 4.65 vs Gemini 4.68 vs ChatGPT 4.62). The overall score out of 10 was low for both models (Combined 5.5 vs Gemini 5.72 vs open AI 5.28).

Conclusion: While both AI platforms created 10 grammatically correct board style questions and were rated highly on technical aspects of question writing, rheumatology attendings rated them poorly for correctness, relevant detractors and overall score. Learners should take caution when using AI to study by reviewing the original text or having experts assess questions for accuracy and validity. Improvements could be made with refining prompts and iterative modifications to the questions, however, that was not pursued in the current study.

Supporting image 1


Disclosures: C. Deffendall: None; N. Annapureddy: AstraZeneca, 1, GlaxoSmithKlein(GSK), 2; K. Byram: None; E. Chew: None; T. Reese: None.

To cite this abstract in AMA style:

Deffendall C, Annapureddy N, Byram K, Chew E, Reese T. Two Common AI Models Create Poor Rheumatology Board Style Questions [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/two-common-ai-models-create-poor-rheumatology-board-style-questions/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/two-common-ai-models-create-poor-rheumatology-board-style-questions/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

Embargo Policy

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM CT on October 25. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology