Date: Monday, November 9, 2020
Session Type: Poster Session D
Session Time: 9:00AM-11:00AM
Background/Purpose: Patient reported outcome (PRO) data have assumed increasing importance in the care of rheumatoid arthritis (RA) patients. However, physician-derived disease activity measures such as CDAI remain the most-accepted metrics to assess RA. The possibility that newer, longitudinal PRO might proxy for the CDAI has not been evaluated.
Methods: Using data from the Comparative and Pragmatic Study of Golimumab IV vs. Infliximab (AWARE), we evaluated RA patients initiating one of these two therapies who started in moderate or high disease activity and remained under observation through 6 months or stopped due to lack of efficacy. The prediction target was CDAI and CDAI disease activity category at Month 6. Candidate predictors included baseline CDAI, baseline PROs including those from the NIH PROMIS system (e.g. Pain Interference, Fatigue, Physical Function, Sleep Disturbance), and followup PROs at month 1 and 6 (+-30 days). Data were randomly partitioned into training (2/3) and test (1/3) datasets. Multiple machine learning (ML) methods (e.g. Gradient Boosting: XGBoost, Random Forests: RF, elasticnet regularization: ER, support vector machine: SVM) were used to both predict CDAI, and classify CDAI disease activity category (remission/low vs. moderate/high). Feature selection was conducted using R package mlr3 and hyper-parameter tuning was conducted using a random search method. Model performance evaluated cross-validated error comparing different ML approaches using both train and test data.
Results: A total of 391 AWARE patients were analyzed. Of these, the distribution of disease activity by CDAI at month 6 was remission (4.9%), low (26.6%), moderate (31.4%), and high (37.1%). In univariate analysis examining outcomes at 6 months (Table 1), and depending on which modeling method was used, the most important features included pain intensity, PROMIS measures (social participation, pain interference, pain intensity, and physical function), baseline CDAI, and age. Among all ML methods , random forest performed best. To classify LDA/remission vs. moderate/high based on regression, accuracy ranged from 0.69 (XGBoost) to 0.80 (RF) (Table 2, left). VM, ER, and RF had high specificity, ranging from 0.93 for SVM to 0.99 for RF; but low sensitivity, ranging from 0.26 for RF to 0.38 for ER. XGBoost had adequate sensitivity (0.65) and specificity (0.71). Predicted vs. observed CDAI (Figure) showed some patients had higher observed than predicted CDAI at month 6. Direct classification generated somewhat similar or lower model performance (Table 2, right).
Conclusion: Machine learning methods coupled with longitudinal PRO data appear useful and can achieve 80-90% accuracy to classify RA disease activity among patients starting a new biologic. This approach has promise for real-world evidence generation in the common circumstance where physician-derived disease activity data is not available yet PRO measures are.
To cite this abstract in AMA style:Curtis J, Xie F, Kafka S, Black S. Machine Learning Coupled with Patient Reported Outcome Data to Classify & Predict RA Disease Activity [abstract]. Arthritis Rheumatol. 2020; 72 (suppl 10). https://acrabstracts.org/abstract/machine-learning-coupled-with-patient-reported-outcome-data-to-classify-predict-ra-disease-activity/. Accessed December 2, 2020.
« Back to ACR Convergence 2020
ACR Meeting Abstracts - https://acrabstracts.org/abstract/machine-learning-coupled-with-patient-reported-outcome-data-to-classify-predict-ra-disease-activity/