Session Information
Date: Wednesday, November 8, 2017
Title: Health Services Research II: Methods and Technology in Care and Research
Session Type: ACR Concurrent Abstract Session
Session Time: 9:00AM-10:30AM
Background/Purpose: Varicella zoster virus infections (VZV) can be associated with significant morbidity in immunosuppressed hosts. However, methods do not exist to systematically identify which patients with rheumatic diseases are at highest risk for VZV, information critical for implementing preventive strategies such as vaccination or antiviral prophylaxis. Machine learning methods that can combine large amounts of information from across the electronic health record (EHR) are increasingly being explored in healthcare. In this study, we derived and compared machine learning algorithms to classify the development of VZV using health system wide EHR data.
Methods: We used data from an EHR with over 800,000 patients from a university-based health system from 2012-2016. We identified incident VZV using a combination of ICD code (B02.xx) and a text string processing algorithm (terms: “zoster” and/or “shingles”). All structured (immunizations, vitals, allergies, medications, laboratories, insurance, encounters, providers, demographics) and unstructured data (i.e. text from clinical notes) from before the VZV event were used. A sample of 201 patients was selected and chart reviewed to validate case status (n=100 cases, 101 controls). We used a supervised approach to identify predictors of VZV and compared performance metrics of 6 machine learning algorithms, including: logistic regression, elastic net, random forests, support vector machine, generalized boosted models, and naïve Bayes. Various datasets were evaluated using information at 1, 3, 6, 12, and 18 months prior to index date with repeated cross-fold validation.
Results: Preliminary results indicate that generalized boosted models based on 3 months of data prior to VZV outperformed all other algorithms (AUC 0.85; accuracy 0.80; Kappa 0.60) (Table 1). Random forest models also performed well (AUC 0.81; accuracy 0.72), but had a lower reliability (Kappa =0.44). Logistic regression and naïve Bayes models performed the poorest (AUC 0.58 and 0.50, respectively). Top variables associated with VZV included sociodemographics (age, sex, race), clinical (blood pressure, BMI, medications), and health care utilization (number of encounters).
Conclusion: Generalized boosted models outperformed other algorithms in identifying VZV in a large university health system, with algorithms that used 3 months of data prior to infection as having the best performance. Further refinement of algorithms with a larger sample size and incorporating more data will assist in developing a highly accurate classification algorithm for VZV that can be used to inform clinical decision making in real-time. This proof-of-concept study highlights the promise of leveraging all the data available through EHR to flag patients who may be at risk for adverse drug events or medical complications before they occur.
Table 1. Algorithm performance results using 3 months of electronic medical record data (n=201)
Logistic Regression |
Elastic Net |
Random Forest |
Support Vector Machine |
Generalized Boosted Models |
Naïve Bayes |
|
AUC Accuracy Reliability F-score Sens. Spec. PPV NPV |
0.58 0.58 0.16 0.46 0.36 0.80 0.64 0.56 |
0.70 0.70 0.40 0.75 0.88 0.52 0.65 0.81 |
0.81 0.72 0.44 0.72 0.72 0.72 0.72 0.72 |
0.70 0.58 0.16 0.57 0.56 0.60 0.58 0.50 |
0.85 0.80 0.60 0.80 0.80 0.80 0.80 0.80 |
0.50 0.50 0.00 0.67 1.00 0.00 0.50 0.00 |
To cite this abstract in AMA style:
Gianfrancesco M, Schmajuk G, Murray S, Ludwig D, Hannun A, Avati A, Tamang S, Yazdany J. Performance of Machine Learning Methods Using Electronic Medical Records to Predict Varicella Zoster Virus Infection [abstract]. Arthritis Rheumatol. 2017; 69 (suppl 10). https://acrabstracts.org/abstract/performance-of-machine-learning-methods-using-electronic-medical-records-to-predict-varicella-zoster-virus-infection/. Accessed .« Back to 2017 ACR/ARHP Annual Meeting
ACR Meeting Abstracts - https://acrabstracts.org/abstract/performance-of-machine-learning-methods-using-electronic-medical-records-to-predict-varicella-zoster-virus-infection/