Optimizing the Clinical Application of Rheumatology Guidelines Using Large Language Models: A Retrieval-Augmented Generation Framework Integrating EULAR and ACR Recommendations

Alfredo Madrid¹, Diego Benavent², Chamaida Plasencia-Rodríguez³, Zulema Rosales-Rosado⁴, Beatriz Merino-Barbancho⁵ and DALIFER FREITES⁶, ¹Roche, Global Data Science and Analytics, Madrid, Spain., Madrid, Madrid, Spain, ²Hospital Universitari de Bellvitge, Madrid, Spain, ³Hospital Universitario La Paz, MADRID, Spain, ⁴Grupo de Patología Musculoesquelética. Hospital Clínico San Carlos. Instituto de Investigación Sanitaria San Carlos (IdISSC), Madrid, Spain, Madrid, Madrid, Spain, ⁵Escuela Técnica Superior de Ingenieros de Telecomunicación. Universidad Politécnica de Madrid. Avenida Complutense, Madrid, Spain, Madrid, Madrid, Spain, ⁶Rheumatology Service, San Carlos Clinical Hospital, Madrid, Madrid, Spain

Meeting: ACR Convergence 2025

Keywords: informatics, Managed Care

Session Information

Date: Sunday, October 26, 2025

Title: (0175–0198) Health Services Research Poster I

Session Type: Poster Session A

Session Time: 10:30AM-12:30PM

Background/Purpose: Timely access to current rheumatology guidelines at the point of care presents a significant challenge. Large Language Models (LLMs) offer potential solutions, but their propensity for “hallucinations” raises safety concerns. The primary objective of this study was to develop and evaluate a novel Retrieval-Augmented Generation (RAG) system, the first of its kind specifically for adult rheumatology, integrating European Alliance of Associations for Rheumatology (EULAR) and American College of Rheumatology (ACR) guidelines to provide clinicians with timely, evidence-based recommendations.

Methods: Seventy-four clinically relevant EULAR and ACR management guidelines for adult rheumatology were selected and processed. A RAG system was implemented using the LangChain framework, voyage-3 embedding model, and a Qdrant vector database, see Figure 1. For evaluation, 740 guideline-specific questions were generated. Answers were produced by an LLM (ChatGPT-o3-mini) with context retrieval (RAG) and without (baseline). Performance was assessed by an LLM-as-a-judge (Gemini 2.0 Flash) using a 5-point Likert scale across five dimensions (relevance, factual accuracy, safety, completeness, conciseness) and by determining preference. Wilcoxon signed-rank and Binomial tests were used for statistical analysis. Two blinded rheumatologists independently validated a random 15% sample of questions.

Results: The LLM-as-a-judge evaluation revealed that the RAG system significantly outperformed the baseline system across all criteria (p< 0.001). The RAG system was significantly preferred by the LLM-as-a-judge in 92.8% of comparisons (p< 0.001), Table 1 Manual evaluation by rheumatologists confirmed these findings, with significant improvements in accuracy, safety, and completeness for the RAG system (p< 0.001), which was preferred in 71.2%-74.8% of comparisons (p< 0.001), Table 2.

Conclusion: This study successfully developed and validated a RAG system integrating EULAR and ACR guidelines for adult rheumatology. The system significantly enhances the quality and reliability of LLM-generated answers, providing a robust foundation for AI-driven clinical decision support tools. Such tools have the potential to improve guideline adherence and evidence-based practice in rheumatology by offering clinicians rapid, context-aware access to recommendations.

Figure 1: Walkthrough of the entire process—from initial creation to final evaluation—of the RAG architecture proposed

Table 1: LLM-as-a-judge evaluation results.

Table 2: Manual evaluation results.

Disclosures: A. Madrid: Roche, 3; D. Benavent: AbbVie/Abbott, 2, 6, Eli Lilly, 6, Janssen, 6, Novartis, 5, 6, Pfizer, 6, Savana, 7, UCB, 2, 6; C. Plasencia-Rodríguez: AbbVie/Abbott, 5, 6, Eli Lilly, 6, Novartis, 6, Pfizer, 5, 6, UCB, 6; Z. Rosales-Rosado: None; B. Merino-Barbancho: None; D. FREITES: None.

To cite this abstract in AMA style:

Madrid A, Benavent D, Plasencia-Rodríguez C, Rosales-Rosado Z, Merino-Barbancho B, FREITES D. Optimizing the Clinical Application of Rheumatology Guidelines Using Large Language Models: A Retrieval-Augmented Generation Framework Integrating EULAR and ACR Recommendations [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/optimizing-the-clinical-application-of-rheumatology-guidelines-using-large-language-models-a-retrieval-augmented-generation-framework-integrating-eular-and-acr-recommendations/. Accessed .

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/optimizing-the-clinical-application-of-rheumatology-guidelines-using-large-language-models-a-retrieval-augmented-generation-framework-integrating-eular-and-acr-recommendations/