Can Large Language Models Support Clinical Decision-Making in Atypical SLE? A Comparative Analysis

Beste Acar¹, Berkay Aktas¹, Oguzhan Omer Kizilkaya¹, Zekayi Kutlubay² and serdal Ugurlu³, ¹Cerrahpasa Faculty of Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey, ²Istanbul University-Cerrahpasa, Cerrahpasa Faculty of Medicine, Department of Dermatology, Istanbul, Turkey, Istanbul, Istanbul, Turkey, ³Istanbul University-Cerrahpasa, Department of Internal Medicine, Division of Rheumatology, Istanbul, Turkey

Meeting: ACR Convergence 2025

Keywords: Diagnostic criteria, Disease-Modifying Antirheumatic Drugs (Dmards), informatics, macrophage activation syndrome, Systemic lupus erythematosus (SLE)

Session Information

Date: Monday, October 27, 2025

Title: (1467–1516) Systemic Lupus Erythematosus – Diagnosis, Manifestations, & Outcomes Poster II

Session Type: Poster Session B

Session Time: 10:30AM-12:30PM

Background/Purpose: This study evaluates the contributions of large language models (LLMs) in clinical decision-making for atypical presentations of systemic lupus erythematosus (SLE). It focuses on the models’ ability to assist with diagnosis, further investigations and treatment planning. Diagnostic and treatment recommendations from LLMs were compared with those of treating physicians and the relative strengths of each LLM were assessed. To our knowledge, this is the first study to assess LLMs in both diagnosis and management of atypical SLE. The LLMs used in the study are Claude 3 Opus, GPT-4o Mini and Gemini 2.0 Flash.

Methods: A PubMed search identified case reports published after April 2024, ensuring cases were not part of the LLMs’ prior training data. Of 95 studies reviewed, 24 cases from 23 studies were included, involving atypical SLE presentations that did not meet established classification criteria. Exclusion criteria were prior SLE diagnosis, insufficient diagnostic information or unavailable full texts.Clinical data including symptoms, laboratory, imaging findings and medical history were extracted, standardized and entered into LLMs for diagnosis, further investigations and treatment planning. Models operated without web access, relying entirely on internal knowledge. Antinuclear antibody (ANA) testing was prioritized for its high sensitivity and the timing and frequency of ANA requests were evaluated as indicators of SLE suspicion. The models’ ability to identify SLE and its ranking in differential diagnoses were assessed. Initial and long-term treatment recommendations were compared with those of treating physicians.

Results: All LLMs successfully suggested SLE as a differential diagnosis in every case. While Gemini mostly ranked SLE first, Claude requested ANA more frequently but neither difference was significant. However, GPT prioritized ANA significantly earlier (p=0.011). Table 1 shows model comparisons.Initial treatment was similar across models, suggesting corticosteroids as first-line therapy. Long-term treatment plans varied; Claude aligned most with treating physicians’ long-term plans, while GPT often proposed guideline-consistent alternatives, such as mycophenolate mofetil (MMF) or hydroxychloroquine (HCQ).In one case, all LLMs recommended HCQ despite the contraindication due to optic nerve atrophy, highlighting limitations in safety considerations. GPT and Gemini more frequently recalled rare differentials such as Aicardi-Goutières and Evans syndromes, whereas Claude more often assigned direct SLE diagnoses with fewer alternatives.The lack of a non-SLE control group limited the assessment of false-positives and diagnostic specificity, raising concerns about potential overdiagnosis, particularly with GPT’s early ANA prioritization.

Conclusion: LLMs effectively prioritized ANA testing, supporting early SLE consideration in complex cases. They showed strong diagnostic performance and appropriate initial treatment suggestions. However, variability in long-term management and limited safety considerations remain. The lack of a non-SLE control group limits specificity analysis and highlights the need for further research to improve clinical use.

Table 1. Performance Comparison of LLMs in Diagnostic and Treatment Decisions for Atypical SLE Cases

Disclosures: B. Acar: None; B. Aktas: None; O. Kizilkaya: None; Z. Kutlubay: None; s. Ugurlu: None.

To cite this abstract in AMA style:

Acar B, Aktas B, Kizilkaya O, Kutlubay Z, Ugurlu s. Can Large Language Models Support Clinical Decision-Making in Atypical SLE? A Comparative Analysis [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/can-large-language-models-support-clinical-decision-making-in-atypical-sle-a-comparative-analysis/. Accessed .

« Back to ACR Convergence 2025

ACR Meeting Abstracts - https://acrabstracts.org/abstract/can-large-language-models-support-clinical-decision-making-in-atypical-sle-a-comparative-analysis/