Session Information
Date: Tuesday, October 28, 2025
Title: (2227–2264) Rheumatoid Arthritis – Diagnosis, Manifestations, and Outcomes Poster III
Session Type: Poster Session C
Session Time: 10:30AM-12:30PM
Background/Purpose: Clinical notes typically contain a valuable trove of information which is rarely used in predictive modeling given the complexity of working with unstructured data. As artificial intelligence (AI) advances and multimodal models become necessary to discern complex clinical relationships, the data encoded in clinical notes become extremely important to use. We aimed to apply a generative AI model, i.e. Jina-embeddings –V3 (Jina) model to medical records of patients with rheumatoid arthritis (RA) and show its ability to extract relevant phenotypic features of RA, testing the hypothesis that the model will be able to predict RA diagnosis before the clinical diagnosis.
Methods: The study included 4100 patients with RA diagnosis between 2000 and 2024 mean age was 61.1 (13.4) , 2870 (70%) females, 2450 (59.7%) RF and/or CCP antibody positive). RA was defined as 2 ICD 9/10 codes at least 30 days apart, and each case was confirmed by manual record review, as well as around 80000 persons without RA with notes from the same time period. The notes were arranged in chronological order. Jina was chosen as the embedding model to fine tune given its excellent performance as compared to its relatively small size of 570 million parameters. The model was hosted on a Google cloud container which also hosted a separate storage bucket for all clinical notes extracted for the manually curated dataset of patients with and without RA. The Jina model was fine-tuned on 70% of these notes while the rest of the notes were saved for testing and validation. The fine tuning was performed on the equivalent of 8 H100 GPU (graphic processing units) over the course of 12 hours.
Results: The total number of notes used during the fine-tuning process was 7.8 million. This included 1.2 million notes from patients with RA and 6.6 million notes from persons without RA. These notes included all physician provider notes from any specialty, as well as allied health notes such as nursing notes and clinical communications, as summarized in Table 1. After the model was trained, it was tested on 3.2 million notes (2.6 million non RA and 650K RA patients). Mean duration of available follow up prior to RA diagnsis was: 8.7 yrs (SD 11 yrs). The model was able to discern between patients with RA and without RA with an average precision of 0.8 up to 12 months in advance of their RA diagnosis date.
Conclusion: We showed the ability of the generative AI model (i.e., the fine-tuned Jina embedding model) to predict RA onset months prior to the clinical diagnosis, based on clinical notes. Such predictions can be used to alert a non-rheumatological provider about a concern for RA in advance, enabling earlier referral to a rheumatologist. The work is ongoing on benchmarking this model against a manually curated set of phenotypic characteristics specifically as it relates to the model’s ability to encode clinically relevant information which can be used in different clinical predictive models.
To cite this abstract in AMA style:
Ayanian S, Rezaei S, Darveaux D, Blasi M, myasoedova E. Generative AI model identifies patients with Rheumatoid Arthritis (RA) months prior the diagnosis date: results from a large real-world RA cohort [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/generative-ai-model-identifies-patients-with-rheumatoid-arthritis-ra-months-prior-the-diagnosis-date-results-from-a-large-real-world-ra-cohort/. Accessed .« Back to ACR Convergence 2025
ACR Meeting Abstracts - https://acrabstracts.org/abstract/generative-ai-model-identifies-patients-with-rheumatoid-arthritis-ra-months-prior-the-diagnosis-date-results-from-a-large-real-world-ra-cohort/