Machine Learning Coupled with Patient Reported Outcome Data to Classify & Predict RA Disease Activity

Jeffrey R Curtis¹, Fenglong Xie², Shelly Kafka³ and Shawn Black³, ¹Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, ²University of Alabama at Birmingham, Birmingham, AL, ³Janssen Scientific Affairs, LLC, Horsham, PA

Meeting: ACR Convergence 2020

Keywords: Disease Activity, rheumatoid arthritis

Session Information

Date: Monday, November 9, 2020

Title: RA – Diagnosis, Manifestations, & Outcomes Poster IV: Lifespan of a Disease

Session Type: Poster Session D

Session Time: 9:00AM-11:00AM

Background/Purpose: Patient reported outcome (PRO) data have assumed increasing importance in the care of rheumatoid arthritis (RA) patients. However, physician-derived disease activity measures such as CDAI remain the most-accepted metrics to assess RA. The possibility that newer, longitudinal PRO might proxy for the CDAI has not been evaluated.

Methods: Using data from the Comparative and Pragmatic Study of Golimumab IV vs. Infliximab (AWARE), we evaluated RA patients initiating one of these two therapies who started in moderate or high disease activity and remained under observation through 6 months or stopped due to lack of efficacy. The prediction target was CDAI and CDAI disease activity category at Month 6. Candidate predictors included baseline CDAI, baseline PROs including those from the NIH PROMIS system (e.g. Pain Interference, Fatigue, Physical Function, Sleep Disturbance), and followup PROs at month 1 and 6 (+-30 days). Data were randomly partitioned into training (2/3) and test (1/3) datasets. Multiple machine learning (ML) methods (e.g. Gradient Boosting: XGBoost, Random Forests: RF, elasticnet regularization: ER, support vector machine: SVM) were used to both predict CDAI, and classify CDAI disease activity category (remission/low vs. moderate/high). Feature selection was conducted using R package mlr3 and hyper-parameter tuning was conducted using a random search method. Model performance evaluated cross-validated error comparing different ML approaches using both train and test data.

Results: A total of 391 AWARE patients were analyzed. Of these, the distribution of disease activity by CDAI at month 6 was remission (4.9%), low (26.6%), moderate (31.4%), and high (37.1%). In univariate analysis examining outcomes at 6 months (Table 1), and depending on which modeling method was used, the most important features included pain intensity, PROMIS measures (social participation, pain interference, pain intensity, and physical function), baseline CDAI, and age. Among all ML methods , random forest performed best. To classify LDA/remission vs. moderate/high based on regression, accuracy ranged from 0.69 (XGBoost) to 0.80 (RF) (Table 2, left). VM, ER, and RF had high specificity, ranging from 0.93 for SVM to 0.99 for RF; but low sensitivity, ranging from 0.26 for RF to 0.38 for ER. XGBoost had adequate sensitivity (0.65) and specificity (0.71). Predicted vs. observed CDAI (Figure) showed some patients had higher observed than predicted CDAI at month 6. Direct classification generated somewhat similar or lower model performance (Table 2, right).

Conclusion: Machine learning methods coupled with longitudinal PRO data appear useful and can achieve 80-90% accuracy to classify RA disease activity among patients starting a new biologic. This approach has promise for real-world evidence generation in the common circumstance where physician-derived disease activity data is not available yet PRO measures are.

Table 1: Factors associated with attaining remission or low disease activity (CDAI ≤10) at visit 3 (month 6)

Table 2: Model performance in Test data

Figure: Predicted vs. Observed CDAI at visit 3 (month 6) by RandomForest

Disclosure: J. Curtis, AbbVie, 2, 5, Amgen, 2, 5, Bristol-Myers Squibb, 2, 5, Corrona, 2, 5, Janssen, 2, 5, Lilly, 2, 5, Myriad, 2, 5, Pfizer, 2, 5, Regeneron, 2, 5, Roche, 2, 5, UCB, 2, 5, Gilead Sciences, Inc., 5, Sanofi, 5; F. Xie, None; S. Kafka, Janssen Scientific Affairs, LLC, 1, 3; S. Black, Janssen Scientific Affairs, LLC, 1, 3.

To cite this abstract in AMA style:

Curtis J, Xie F, Kafka S, Black S. Machine Learning Coupled with Patient Reported Outcome Data to Classify & Predict RA Disease Activity [abstract]. Arthritis Rheumatol. 2020; 72 (suppl 10). https://acrabstracts.org/abstract/machine-learning-coupled-with-patient-reported-outcome-data-to-classify-predict-ra-disease-activity/. Accessed .

« Back to ACR Convergence 2020

ACR Meeting Abstracts - https://acrabstracts.org/abstract/machine-learning-coupled-with-patient-reported-outcome-data-to-classify-predict-ra-disease-activity/