Session Information
Session Type: ACR Poster Session A
Session Time: 9:00AM-11:00AM
Background/Purpose: Administrative claims and electronic health record (EHR) data are commonly used to assess outcomes in rheumatoid arthritis (RA). However, direct measures of functional status are typically not available in these data sources to control for confounding in comparative effectiveness research.
Methods: Corrona registry data linked to Medicare claims (2006-2014) were used to build a claims-based disability classifier, measured by HAQ. Eligible patients had RA per Corrona rheumatologist, and >=1yr prior coverage. Demographics, socioeconomic factors, comorbidities, healthcare utilization, and medications from claims data were included as predictors.
In separate analyses, HAQ was classified dichotomously (<1, ≥1), as 3 categories (0–<0.5, ≥0.5–<1.5, and ≥1.5–3), and as a continuous variable, converted to corresponding HAQ category. Generalized logistic regression (GenLogit) with LASSO for variable selection and results were compared with machine learning methods including RandomForests, using a forest of 2000 trees. Separate models were run classifying each of the 8 HAQ subdomains separately, and then summing to form the composite HAQ score. Misclassification rates were compared and the area under the receiver operator curves (AUROC) was described.
Results: A total of 2,788 RA patients were eligible, classifying 52% of patients with low (n=1448) and 48% with high (n=1340) HAQ; and as 3 categories, low (n=887), moderate (n=1109), and high (n=792). Univariable analysis showed higher HAQ was associated with older age, being disabled (per Medicare), rural residence, and greater comorbidity burden, and higher healthcare utilization.
Variables selected by various methods were similar (Table). In the 2 category HAQ models, overall misclassification was 29% (RandomForests), 28% , and 38% (LASSO), with an AUROC of 0.84. In the 3 category HAQ models, RandomForests yielded misclassification of ~48% that did not meaningfully differ across the 3 HAQ categories. Misclassification of the GenLogit model varied widely by HAQ category. When misclassification did occur in the 3 category analysis, patients were usually 1 category off; more extreme misclassification (categorizing low HAQ patients as high, or vice-versa) was uncommon (<8%). The median (IQR) difference in the (observed – predicted) HAQ was 0.00 (-0.45, 0.41) units. Ongoing work is refining these models, reducing the misclassification rate, and validating the approach.
Conclusion: Results from this preliminary analysis suggest that administrative claims and EHR data might be useful to classify RA-related disability as measured by the HAQ with reasonable accuracy. Larger datasets and richer information in EHR data likely will improve the accuracy of these methods.
Table: Key variables from administrative claims data selected by |
||
|
RandomForests |
Generalized logistic regression with LASSO |
Age Number of rheumatology visits Number of AHRQ CCS comorbidities Number of unique medications (any type) Number of outpatient physician visits Baseline steroid use Median household income Elixhauser comorbidity index Wheelchair Disable Sex |
X X X X X X X X X X X |
X X X X X X X |
*results shown for 3 category HAQ models |
To cite this abstract in AMA style:
Curtis JR, Yun H, Etzel CJ, Yang S, Chen L. Use of Machine Learning and Traditional Statistical Methods to Classify RA-Related Disability Using Administrative Claims Data [abstract]. Arthritis Rheumatol. 2017; 69 (suppl 10). https://acrabstracts.org/abstract/use-of-machine-learning-and-traditional-statistical-methods-to-classify-ra-related-disability-using-administrative-claims-data/. Accessed .« Back to 2017 ACR/ARHP Annual Meeting
ACR Meeting Abstracts - https://acrabstracts.org/abstract/use-of-machine-learning-and-traditional-statistical-methods-to-classify-ra-related-disability-using-administrative-claims-data/