Session Information
Session Type: Poster Session A
Session Time: 10:30AM-12:30PM
Background/Purpose: Applying the methods of artificial intelligence (AI) to genomic data for clinical outcome prediction in rheumatoid arthritis (RA) is an area of growing research. Suboptimal prediction accuracy and challenges with replication often preclude the clinical applicability of machine learning models. Generative AI including large language models (LLM) can help overcome these challenges but have not been applied to genomic data with clinical applications (1,2). We aimed to develop and test a novel nucleotide transformer (NT) using complete exomes to optimize the use of exomic data for individualized outcome prediction in RA.
Methods: A LLM based on transformer architecture was developed using exomic sequences from a large exome repository of 58,000 people as part of a collaboration between a US academic institution and a US genomic repository. The NT is designed to accommodate entire exomes for its token size rather than SNPs and can: identify variations from the human reference genome (HRG), pick-up long-range interactions, evaluate HLA subtypes and connect this data to clinical outcomes. The NT is created de novo using state-of-the-art cluster scale performance equivalent to 600 graphics processing units, and tested on patients with RA using an active registry of 5,985 patients with probable RA based on ICD 9/10 codes for RA and >/= 1 claim for methotrexate between 1/1/1998-12/31/2023. All patient records were manually reviewed for confirmation of the RA status and abstraction of clinical characteristics. From a subset of RA patients with available exome data (n=375), data from 100 patients were used for training of the NT and 275 for fine-tuning and downstream task predictions.
Results: We tested the performance of the NT, trained on 58,000 people against the best out of four existing NTs (1). As the proof of concept, our NT outperformed the existing NT, achieving 86.7% accuracy in identifying the next nucleotide on upstream evaluation. The downstream evaluation showed a 96.3% accuracy in identifying splicing sites, 72% for regulatory sites and 53% for chromatin, outperforming the existing NT. The second version of our NT consisting of 450 patients (100 patients with RA and 350 controls) is currently in training, and RA-specific downstream evaluation is expected to be completed by 7/1/2024. The final version of the NT and its RA-specific evaluation will incorporate all 58,000 people. The results of RA-specific clinical outcomes evaluations are expected by 8/15/2024 and we plan to present them during the conference.
Conclusion: This is the largest and the most comprehensive effort in creating a novel NT and the first NT to be clinically tested and applied to RA. This approach is transformational to the use of genomic data in RA as it opens opportunity for identifying complex genomic interactions and associations with clinical outcomes. This is the first building block of an effort to create a foundational multimodal model for outcome prediction in RA.
References:
1. Dalla-Torre et al. bioRxiv 2023.01.11.523679.
2. Nguyen et al. arXiv 2023:2306.15794v2.
To cite this abstract in AMA style:
Ayanian S, Osborne C, Blasi M, Darveaux D, Klee E, Myasoedova E. Constructing and Using a Novel Nucleotide Transformer for Patients with Rheumatoid Arthritis [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/constructing-and-using-a-novel-nucleotide-transformer-for-patients-with-rheumatoid-arthritis/. Accessed .« Back to ACR Convergence 2024
ACR Meeting Abstracts - https://acrabstracts.org/abstract/constructing-and-using-a-novel-nucleotide-transformer-for-patients-with-rheumatoid-arthritis/