Natural Language Processing Tool for Extraction of Patient-Reported Outcomes from a National Multi-Electronic Health Records Registry

Marie Humbert-Droz¹, Zara Izadi², Gabriela Schmajuk², Milena Gianfrancesco², Jinoos Yazdany² and Suzanne Tamang³, ¹Stanford University, Stanford, ²University of California San Francisco, San Francisco, CA, ³Stanford Center for Population Health Sciences, Redwood City, CA

Meeting: ACR Convergence 2021

Keywords: informatics, Patient reported outcomes, quality of care, quality of life

Session Information

Date: Tuesday, November 9, 2021

Title: Abstracts: Measures & Measurement of Healthcare Quality (1893–1896)

Session Type: Abstract Session

Session Time: 10:45AM-11:00AM

Background/Purpose: Patient reported outcomes (PROs) are increasingly used to track disease activity and facilitate shared decision making in patients with RA. Assessments of disease activity (DA) and functional status (FS) PROs during routine clinical care are recommended in national RA guidelines. However, many rheumatologists do not have support from health IT to reconfigure their EHR systems to collect PROs as structured data. We developed and evaluated a natural language processing (NLP) pipeline for extracting DA and FS scores from clinical notes within the ACR’s Rheumatology Informatics System for Effectiveness (RISE) registry.

Methods: We examined de-identified notes and structured electronic health record (EHR) data from all patients with a confirmed diagnosis of RA (2 ICD codes at least 30 days apart), from January 1, 2015, to December 30, 2018 in the RISE registry. The NLP tool was developed in a stepwise approach to extract scores corresponding to Clinical Disease Activity Index (CDAI), Routine Assessment of Patient Index Data 3 (RAPID3), Multidimensional Health Assessment Questionnaire (MDHAQ), and HAQ (Figure 1). First, in a text pre-processing step, we harmonized the notes’ format. Next, the concepts of interest (PRO instruments and scores) were annotated. A post-processing step involved formatting and score resolution. The performance of the NLP pipeline was evaluated against a gold standard of human chart review of 100 PRO mentions within 48 randomly-selected notes. We calculated an inter-rater agreement between the NLP-extracted scores and structured scores where available. Agreement was calculated according to (1) “exact” matching based on the numerical scores and (2) for DA scores, “fuzzy” matching, based on score categories (remission, low, etc).

Results: Over 34 million notes from 854,628 patients, from 158 practices, and 24 EHR systems were processed through the NLP pipeline. The majority of practices (n=134) had structured data available for comparison. Overall, our system achieved good fidelity for PRO instrument and score extraction, resulting in a sensitivity of 93.2%, specificity of 80.5% and positive predictive value of 87.3%. DA measures (CDAI and RAPID3) showed substantial agreement between notes and structured data; FS measures (MDHAQ and HAQ) showed almost perfect agreement (Table 1).

Conclusion: The developed NLP pipeline demonstrated good performance, was able to extract PROs from clinical notes of practices in the absence of structured data and can potentially facilitate reporting of quality and performance measures for outpatient rheumatology practices. Further studies are needed to evaluate the potential generalizability of the NLP pipeline to other types of PRO instruments, and to determine whether NLP performance varies by EHR, practice or note type.

pipeline_v3.jpeg”NLP pipeline

kappa_table.jpeg”Inter-rater agreement scores between the NLP extractions and the structured data obtained from RISE

Disclosures: M. Humbert-Droz, None; Z. Izadi, None; G. Schmajuk, None; M. Gianfrancesco, None; J. Yazdany, Astra Zeneca, 2, 5, Pfizer, 2, 6, Gilead, 5, BMS Foundation, 5; S. Tamang, None.

To cite this abstract in AMA style:

Humbert-Droz M, Izadi Z, Schmajuk G, Gianfrancesco M, Yazdany J, Tamang S. Natural Language Processing Tool for Extraction of Patient-Reported Outcomes from a National Multi-Electronic Health Records Registry [abstract]. Arthritis Rheumatol. 2021; 73 (suppl 9). https://acrabstracts.org/abstract/natural-language-processing-tool-for-extraction-of-patient-reported-outcomes-from-a-national-multi-electronic-health-records-registry/. Accessed .

« Back to ACR Convergence 2021

ACR Meeting Abstracts - https://acrabstracts.org/abstract/natural-language-processing-tool-for-extraction-of-patient-reported-outcomes-from-a-national-multi-electronic-health-records-registry/