Session Information
Date: Sunday, October 26, 2025
Title: (0430–0469) Rheumatoid Arthritis – Diagnosis, Manifestations, and Outcomes Poster I
Session Type: Poster Session A
Session Time: 10:30AM-12:30PM
Background/Purpose: Although Large Language Models (LLMs) have been successfully used in the analysis of data from Electronic Health Records (EHRs), their implementation in the field of rheumatology has been very limited. Most information on disease activity of patients with rheumatologic conditions is recorded in medical notes instead of structured data. Recently, new LLMs have been created that can be deployed locally without transmitting information to external servers (Privacy-Preserving Large Language Models, PP-LLMs). We aim to test how accurate these LLMs can be in the identification of disease flares of rheumatic diseases using EHR notes.
Methods: For the pilot study, discharge notes of hospitalized patients with known history of Rheumatoid Arthritis (RA) from the de-identified MIMIC IV dataset were retrieved via Google’s BigQuery API. A specialized prompt was developed using prompt engineering techniques such as chain of thought and few-shot prompting based on the OMERACT workgroup recommendations on flare identification. Next, the state-of-the-art QwQ 32B parameter reasoning model with 8-bit quantization was loaded through the Ollama engine utilizing the llama.cpp framework and subsequently received the discharge notes for analysis. Manual chart review was performed to verify the correct identification of flares, if present.
Results: 45 discharge summaries containing the word “flare” were randomly selected, of which 9 had an RA flare and 36 did not have a flare. The LLM correctly identified 43/45 of the cases with an accuracy of 95.5% without any fine-tuning or human feedback, (Sensitivity/Recall: 100% (9/9), Precision/Positive Predictive Value: 81.8% (9/11), Specificity: 94.4% (34/36), F1 Score: 0.90). Detailed outputs were generated for each case stating whether a flare was identified, the confidence level and the key evidence supporting flare identification. Regarding the 2 cases that the LLM misclassified, the reasoning output was able to highlight the uncertainties identified during the manual chart review of these cases.
Conclusion: Our work represents the first successful use of local LLMs in identifying RA flares from discharge summaries of hospitalized patients. Furthermore, these models can be fine-tuned for other rheumatologic conditions with simple modification of the prompts.
Figure 1: Overview of the Study
Table 1: Confusion Matrix. Columns represent the classification results by the large language model, rows represent the classification based on manual chart review
Table 2: Model Evaluation Metrics
To cite this abstract in AMA style:
Koulas I, Tsaftaridis N, Gkionis M, Paternoster G, Jariwala S, Loupasakis K. Implementing Artificial Intelligence to Identify Rheumatoid Arthritis Flares Using Electronic Medical Records Processed with Privacy-Preserving Large Language Models: A Pilot Study [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/implementing-artificial-intelligence-to-identify-rheumatoid-arthritis-flares-using-electronic-medical-records-processed-with-privacy-preserving-large-language-models-a-pilot-study/. Accessed .« Back to ACR Convergence 2025
ACR Meeting Abstracts - https://acrabstracts.org/abstract/implementing-artificial-intelligence-to-identify-rheumatoid-arthritis-flares-using-electronic-medical-records-processed-with-privacy-preserving-large-language-models-a-pilot-study/