ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2025
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • 2020-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 1252

Can LLMs Categorize Patient Priorities Like Humans? Comparing AI and Human Coders in Arthritis Nominal Group Discussions

Melissa Mannion1, Bryce Thornton1, Bella Mehta2, Ronan O'Beirne1, Emily Smitherman1, Livie Timmerman3, Shilpa Venkatachalam4, Jeffrey Curtis1 and John Osborne1, 1University of Alabama at Birmingham, Birmingham, AL, 2Hospital for Special Surgery, Weill Cornell Medicine, Jersey City, NJ, 3University of Alabama at Birmingham, Gardendale, AL, 4Global Healthy Living Foundation, New York, NY

Meeting: ACR Convergence 2025

Keywords: health behaviors, Juvenile idiopathic arthritis, Pediatric rheumatology

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Monday, October 27, 2025

Title: (1248–1271) Patient Outcomes, Preferences, & Attitudes Poster II

Session Type: Poster Session B

Session Time: 10:30AM-12:30PM

Background/Purpose: Identifying informational needs of individuals with inflammatory arthritis is critical to enhancing communication and supporting shared decision making between patients, caregivers, and providers. However, qualitative methods require significant human effort, time, and specialized software. Our goal was to determine how effectively a large language model (LLM) could classify thematic categorization with human-labeled data from 2 nominal group technique (NGT) studies of individuals with inflammatory arthritis.

Methods: We utilized results from 2 separate NGT studies on the informational needs of individuals with inflammatory arthritis. The first identified information patients want prior to medication changes for juvenile idiopathic arthritis (JIA), including 2 virtual parent groups, 1 virtual adolescent group, and 1 asynchronous adolescent group. The second explored information priorities for adults living with inflammatory arthritis, collected over 6 virtual groups. Qualitative responses to the discussion questions were collected and ranked by participants. Responses were manually assigned themes by at least 2 members of the research team (gold standard). Six LLMs, including advanced proprietary models and open-source alternatives, were zero-shot prompted to assign a single best-fit theme from the human-generated codebook to each participant-generated statement. Model-assigned themes were compared to those assigned independently by 2 human coders. A match was defined as exact agreement between the model’s theme and the human-coded themes. Performance was evaluated using accuracy, macro F1 score, Cohen’s kappa, and Krippendorff alpha relative to each coder.

Results: The adolescent groups had 12 human-labeled and 12 LLM-assigned primary themes; the parent groups had 10 human-labeled and 11 LLM-assigned themes; the adult groups had 8 of each (Table 1). Model accuracy of theme assignment ranged from 65–82% for adolescent statements, 54–95% for parent statements, and 70–100% for adult statements (Table 2). Model agreement with human coders, measured by Cohen’s kappa, ranged from 0.70–0.80 for adolescent statements, 0.47–0.94 for parent statements, and 0.64–1.00 for adult statements. Human interrater Cohen’s kappa was 0.57 for adolescents, 0.59 for parents, and 0.74 for adults (Table 2).

Conclusion: LLMs can reliably approximate human coding of qualitative themes in NGT data, and in some cases showed stronger agreement with individual coders than the coders had with each other. Human- and model-assigned themes were often similar, with strong agreement across datasets. Models performed best when statements expressed clear, explicit themes but faced challenges with ambiguity or overlapping concepts. Performance was strongest for advanced reasoning models but remained robust across open-source and earlier-generation models. These findings support LLMs as scalable tools for thematic analysis in qualitative research. Future directions include refining prompts to assess statement salience and exploring LLM-guided theme discovery without predefined codebooks for flexible, data-driven identification of emerging concepts.

Supporting image 1Table 1: Human and large language model (LLM) – assigned themes by participant group in 2 nominal group studies of patient informational needs

Supporting image 2Table 2: LLM-Human Interrater Reliability Metrics in Thematic Coding of NGT Responses

Supporting image 3Figure 1: Workflow for Theme Assignment and Agreement Evaluation Between Human Coders and LLMs.


Disclosures: M. Mannion: None; B. Thornton: None; B. Mehta: Amgen, 1, Horizon, 1; R. O'Beirne: None; E. Smitherman: None; L. Timmerman: None; S. Venkatachalam: None; J. Curtis: AbbVie, 2, 5, Amgen, 2, 5, Bendcare, 5, Bristol-Myers Squibb(BMS), 5, Corrona, 2, 5, Crescendo, 2, 5, Eli Lilly, 2, 5, FASTER, 2, 4, Genentech, 2, 5, GlaxoSmithKlein(GSK), 2, 5, Janssen, 2, 5, Moderna, 2, 5, Novartis, 2, 5, Pfizer, 2, 5, Roche, 2, 5, Sanofi, 2, 5, UCB, 2, 5; J. Osborne: None.

To cite this abstract in AMA style:

Mannion M, Thornton B, Mehta B, O'Beirne R, Smitherman E, Timmerman L, Venkatachalam S, Curtis J, Osborne J. Can LLMs Categorize Patient Priorities Like Humans? Comparing AI and Human Coders in Arthritis Nominal Group Discussions [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/can-llms-categorize-patient-priorities-like-humans-comparing-ai-and-human-coders-in-arthritis-nominal-group-discussions/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/can-llms-categorize-patient-priorities-like-humans-comparing-ai-and-human-coders-in-arthritis-nominal-group-discussions/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

Embargo Policy

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM CT on October 25. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology