Session Information
Date: Saturday, November 7, 2020
Title: Epidemiology & Public Health Poster II: OA, Osteoporosis, & Other Rheumatic Disease
Session Type: Poster Session B
Session Time: 9:00AM-11:00AM
Background/Purpose: In 2016, MarketScan data no longer included information about inpatient mortality, compromising the ability to study fatal hospitalization events. Using data through 2015 when mortality remained available, we developed an algorithm to accurately identify in-hospital mortality using coverage patterns, proximate healthcare claims diagnoses, and corresponding information for family members.
Methods: We selected the latest hospital claim in 2011-2015 MarketScan data for each individual. Hospitalizations with discharge status of (20, 40, 41) were defined as death, and (21, 87, missing) as alive. Predictors included age, sex, coverage disenrollment post hospitalization, diagnosis codes and timing of last submitted claim, and corresponding information from family members linked by family ID (to confirm family members remained enrolled). Individual predictors were optimized using the c (concordance) index. Datasets were split into Training (80% random sample of hospitalizations, 2011-2013); Test1 (80% in 2014-2015) and Test2 (remaining 20%, 2011-2015). Machine learning (ML) methods included decision tree (DT), random forest (RF), elastic-net regularization (ER) and XGBoost methods. Hyper-parameters were tuned using a random search method with 10-fold cross validation across a preset range. was used to assess model performance. Each model was validated in Test1 and Test2 datasets separately. Sensitivity, specificity, positive predicted value (PPV) and accuracy were calculated in all datasets.
Results: 1,307,532 hospitalizations were selected among patients; mean age was 47.4 (standard deviation 25.9) years and 43.6% were male. Training data included 727,887 hospitalizations, 23.1% ending in death. Disenrollment ending within 30 days (Figure 1), and last claim within 90 days were optimized as single predictors (Figure 2). Models using different ML methods performed well in all datasets, albeit with a trend for less optimal parameters with decision tree methods. In test datasets, PPV and accuracy was as high as 0.91 and 0.95 respectively.
Conclusion: We derived and validated an algorithm for identifying in-hospital mortality in Marketscan claims data which performed well with >90% accuracy. This represents filling of a key gap for health outcomes research with Marketscan data.
Figure 1: Optimization of cut-point for disenrollment
Figure 2: Optimization of interval from hospital discharge date to last claim
Table: Model performance to classify in hospital mortality in Training and Test datasets
To cite this abstract in AMA style:
Xie F, Zhao H, Yun H, Bernatsky S, Curtis J. Developing an Algorithm for Identifying Mortality in MarketScan Claims Data Using Machine Learning [abstract]. Arthritis Rheumatol. 2020; 72 (suppl 10). https://acrabstracts.org/abstract/developing-an-algorithm-for-identifying-mortality-in-marketscan-claims-data-using-machine-learning/. Accessed .« Back to ACR Convergence 2020
ACR Meeting Abstracts - https://acrabstracts.org/abstract/developing-an-algorithm-for-identifying-mortality-in-marketscan-claims-data-using-machine-learning/