ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • ACR Convergence 2020
    • 2020 ACR/ARP PRSYM
    • 2019 ACR/ARP Annual Meeting
    • 2018-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 1823

Identification and Prediction of Systemic Sclerosis Intrinsic Subtypes Using Semi-Supervised and Supervised Learning on Gene Expression Data of Multiple Cohorts

Zhiyun Gong1, Rezvan Parvizi2, Helen Jarnagin1, Haobin Chen3, Madeline Morrisson4, Tammara Wood5, Monique Hinchcliff6 and Michael Whitfield2, 1Dartmouth College, Lebanon, NH, 2Geisel School of Medicine at Dartmouth, Hanover, NH, 3Dartmouth, Lebanon, NH, 4Geisel School of Medicine at Dartmouth College, Hanover, NH, 5Dartmouth, Hanover, NH, 6Yale School of Medicine, Westport, CT

Meeting: ACR Convergence 2024

Keywords: Bioinformatics, Gene Expression, genomics, skin, Systemic sclerosis

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Monday, November 18, 2024

Title: Systemic Sclerosis & Related Disorders – Basic Science Poster II

Session Type: Poster Session C

Session Time: 10:30AM-12:30PM

Background/Purpose: Systemic Sclerosis (SSc) is a molecularly heterogeneous disease. Distinct subtypes of patients have been identified based on gene expression in skin. In this study, we re-processed genome-wide transcriptomic data of skin biopsies from multiple independent cohorts to generate the largest integrated discovery dataset to date. Semi-supervised clustering was performed to identify SSc intrinsic subtypes and these labels were then used to develop a new robust supervised learning model to predict subtypes in skin.

Methods: Gene expression data from three cohorts (GSE9285, GSE32413, and GSE59787) representing 293 paired forearm and back skin samples from 37 healthy and 137 SSc individuals were processed using a consistent bioinformatic pipeline. Samples were first clustered using constrained k-means, followed by unsupervised k-means clustering on the most heterogeneous group to refine sample groupings. Using the final labels, we developed a set of binary Logistic Regression models with Gene Set Variation Analysis (GSVA) scores to predict Inflammatory, Normal-like, and Fibroproliferative subtypes on new skin samples.

Results: We identified 5 intrinsic molecular subtypes of SSc in this study through semi-supervised clustering and comparison to original publications (Fig. 1A-C). In addition to the previously reported inflammatory, fibroproliferative, and normal-like subtypes, we also found an inflammatory-fibroproliferative group and an intermediate group between inflammatory and normal-like, which may represent a transitional subtype.  Analysis across these groups shows that genetic markers of fibrosis, such as COMP and SFRP4, are highest in inflammatory patients and decrease across the transitional group, to its lowest level in normal-like patients. For classification we used the original 3 subtype labels due to limited samples numbers. Data were split with 20% of samples held out for testing, 4-fold cross-validation (CV) was performed with GSVA scores as features for each subtype. The best models were chosen based on test performance, they showed AUROC of 0.92, 0.91, and 0.93 for inflammatory, normal-like, and fibroproliferative subtypes respectively (Figure 2A). The final subtype assignment for each patient was made by taking the highest predicted probability by the three binary models.  The final, three-class model shows good precision and recall for each subtype (Figures 2B-C). Additionally, for each subtype we determined the gene sets that are most predictive and their importance (Figures 2D-F). The final model has the ability to identify samples in additional independent cohorts with high similarities to the corresponding reference for each subtype using 16 gene sets in total (Figure 3A-C).

Conclusion: These results extend our previously published subtyping results and support the existence of major gene expression subtypes. Additionally, we identified two intermediate subgroups. The classification model for skin samples, validated across multiple cohorts, shows the robustness of our approach and its potential to enhance patient stratification, leading to personalized treatment in SSc in the future.

Supporting image 1

Figure 1. SSc intrinsic subtypes identified by semi-supervised learning. A) PCA plot of all samples colored by the four clusters identified by constrained k-means. B) Sub-clustering of the mixed group by k-means clustering (k=3). DE analyses were performed on each of the subclusters against the three subtypes named in the broad clustering step. Patients in mix_0 cluster shared some gene markers with both inflammatory and fibroproliferative, thus, we named them “inflammatory-fibroproliferative”. Mix_1 shared similar gene expression profiles with the fibroproliferative subtype, so we merged them. Interestingly, we observed that a group of genes were upregulated in the mix_2 cluster compared to normal-like and downregulated compared to inflammatory. Thus, we hypothesized that this is an intermediate subtype on the inflammatory to normal transition spectrum. C) PCA plot of all samples with integrated subtype calls from both levels of clustering. D) Contingency tables comparing new subtype labels to those in the original publications. 82% of inflammatory, 80% of normal-like, and 64% of fibroproliferative samples stayed in the same subtype or were assigned to a relevant newly defined intermediate type, demonstrating concordance with prior analyses of individual datasets.

Supporting image 2

Figure 2. Predictors and their importance for each subtype and integrated multiclass performance. A). Receiver operating characteristic (ROC) curve for each subtype and a micro-average ROC curve. B-C) Multi-class performance metrics and confusion matrix on the holdout set. Final calls were made by taking the prediction with the highest probability from the binary models D) GSVA features and their importance in the binary logistic regression models for the Inflammatory subtype. The Inflammatory subtype is highly associated with upregulation of genes in Plasma cells, T cells, IFN, B cells, and IL17 Complex pathway, as well as downregulation of unsaturated acid metabolism and mitochondrial large ribosomes related genes. E) The Normal-like subtype can be predicted by upregulated Propionate metabolism and downregulated B cell, plasma cell, and endothelial cell pathways. F) GSVA pathways imported from public sources have very limited predictive power of the fibroproliferative subtype. Through Differential Expression analysis, we found that these patients have a list of down-regulated genes not well-represented in the public gene sets. Thus, we created a new gene set “Fibroproliferative down-regulated genes”.

Supporting image 3

Figure 3. A) The enrichment scores of the gene sets used for classification models for each subtype. Immune-related pathways were highly upregulated in Inflammatory, slightly elevated in Fibroproliferative and down-regulated in Normal-like subtype. Lipid and fatty acid metabolism pathways are upregulated in Normal-like subtypes. B) The classification models were applied on seven additional independent test cohorts of DNA microarray SSc skin samples. Similarities between predicted subtypes in these datasets showed median to high similarities to the corresponding training samples of the same subtype. C) Similarities between predicted subtypes in the ASSET cohort using skin RNA-seq data. This preliminary result shows that using GSVA scores as features the models can also generalize to RNA-seq data, especially for Inflammatory and Normal-like subtypes.


Disclosures: Z. Gong: None; R. Parvizi: None; H. Jarnagin: None; H. Chen: None; M. Morrisson: None; T. Wood: None; M. Hinchcliff: AbbVie/Abbott, 2, Boehringer Ingelheim, 5, Kadmon, 5; M. Whitfield: Abbvie, 6, Boehringer Ingelheim, 1, 2, Bristol-Myers Squibb, 2, 5, Celdara Medical, LLC, 5, 8, 9, 10, UCB Biopharma, 2, 5.

To cite this abstract in AMA style:

Gong Z, Parvizi R, Jarnagin H, Chen H, Morrisson M, Wood T, Hinchcliff M, Whitfield M. Identification and Prediction of Systemic Sclerosis Intrinsic Subtypes Using Semi-Supervised and Supervised Learning on Gene Expression Data of Multiple Cohorts [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/identification-and-prediction-of-systemic-sclerosis-intrinsic-subtypes-using-semi-supervised-and-supervised-learning-on-gene-expression-data-of-multiple-cohorts/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to ACR Convergence 2024

ACR Meeting Abstracts - https://acrabstracts.org/abstract/identification-and-prediction-of-systemic-sclerosis-intrinsic-subtypes-using-semi-supervised-and-supervised-learning-on-gene-expression-data-of-multiple-cohorts/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM ET on November 14, 2024. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology