Improving the Efficiency of Clinical Trial Recruitment Using Electronic Health Record Data, Natural Language Processing, and Machine Learning

Tianrun Cai¹, Fiona Cai ², Kumar Dahal ³, Chuan Hong ⁴ and Katherine Liao ¹, ¹Brigham and Women's Hospital, Boston, ²Stuyvesant High School, New York, ³Brigham and Women's Hospital, Boston, MA, ⁴Harvard Medical School, Boston, MA

Meeting: 2019 ACR/ARP Annual Meeting

Keywords: clinical trials, Electronic Health Record, recruitment and rheumatoid arthritis (RA)

Session Information

Date: Wednesday, November 13, 2019

Title: 6W021: RA – Treatments V: Switching & Tapering RA Medications (2906–2911)

Session Type: ACR Abstract Session

Session Time: 11:00AM-12:30PM

Background/Purpose: Efficiently identifying eligible patients is an important component of a successful clinical trial. Billing codes from electronic health record (EHR) data are commonly used to first screen for potential patients, followed by labor-intensive chart review to identify the eligible patients by trial criteria. The objective of this study was to test whether a machine learning screening algorithm (ML-screen) incorporating ICD codes and data extracted from notes using natural language processing (NLP), could improve the efficiency for identifying eligible patients for an ongoing clinical trial.

Methods: We studied EHR data used for a clinical recruitment study of rheumatoid arthritis (RA) and cardiovascular disease recruiting from a tertiary care center (TCC) and a community hospital (CH). The target population were RA patients, age >35, about to initiate a tumor necrosis factor inhibitor, and not on a statin. Prior to this study all patients with ≥1 RA ICD codes (RA_ICD) and age >35 years were selected for chart review. The CH and TCC data sets were both manually reviewed as gold standard labels including 642 and 2387 patients, respectively. All notes were processed with NLP to obtain the number of mentions for the concept of RA and inflammatory arthritis. Three groups of features were considered for the ML-screen (Table 1): (1) inclusion criteria features, e.g. RA_ICD; (2) exclusion criteria features, e.g. # of electronic prescriptions for a statin; (3) the total # ICD codes as a proxy for healthcare utilization. For the ML-screen we considered features within a 2-year timeframe prior to the chart review as well as all years prior. The ML-screen combined two ML methods, random forest (RF) and penalized logistic regression. The goal for the ML-screen was to reduce the number of patients requiring chart review without excluding potentially eligible patients. The ML-screen was compared to alternative approaches using RA_ICD ≥1, RA_ICD ≥2, and RA_ICD ≥1+exclusion criteria features. To test whether the ML-screen can be successfully ported to other institutions, we trained at TCC and applied at CH, and vice versa.

Results: The current method reviewing all charts with RA_ICD≥1 yielded 346 (14.5%) eligible patients out of 2387 at TCC, and 74 (16.0%) out of 642 at CH. Applying the ML-screen would result in reviewing 33% less patients in TCC and 44% less in CH, compared to RA_ICD ≥1, without screening out potentially eligible patients (Table 2). In contrast, RA_ICD ≥2 high sensitivity 0.93-0.98, but did not reduce as many patients for chart review, 2.7-11.3%. The RA_ICD ≥1+exclusion yielded a larger reduction of patients for review, 63-65%, however excluded approximately 22-27% of eligible patients. The ML-screen had similar performance when trained on one institution and tested on the other (Table 3).

Conclusion: The ML-screen incorporating EHR and NLP data can increase the efficiency of clinical trial recruitment by reducing the number of patients requiring chart review; importantly, this approach did not screen out eligible patients. Moreover, the ML-screen can be trained at one institution and applied at another for multi-center clinical trials.

Table 1

Table 1. Features used in the ML-screen for clinical trial recruitment.

table2

Table 2. Comparison of performance between a screen developed using machine learning vs ICD only screens

Table 3

Table 3. Comparison of performance for MLS algorithm across institutions

Disclosure: T. Cai, None; F. Cai, None; K. Dahal, None; C. Hong, None; K. Liao, None.

To cite this abstract in AMA style:

Cai T, Cai F, Dahal K, Hong C, Liao K. Improving the Efficiency of Clinical Trial Recruitment Using Electronic Health Record Data, Natural Language Processing, and Machine Learning [abstract]. Arthritis Rheumatol. 2019; 71 (suppl 10). https://acrabstracts.org/abstract/improving-the-efficiency-of-clinical-trial-recruitment-using-electronic-health-record-data-natural-language-processing-and-machine-learning/. Accessed .

« Back to 2019 ACR/ARP Annual Meeting

ACR Meeting Abstracts - https://acrabstracts.org/abstract/improving-the-efficiency-of-clinical-trial-recruitment-using-electronic-health-record-data-natural-language-processing-and-machine-learning/