Session Information
Date: Wednesday, November 16, 2016
Title: Health Services Research II
Session Type: ACR Concurrent Abstract Session
Session Time: 11:00AM-12:30PM
Background/Purpose: The retrieval of rheumatoid arthritis (RA) disease activity measures recorded in an electronic medical record through natural language processing (NLP) would significantly aid RA management and epidemiologic research. The Veterans Affairs Rheumatoid Arthritis (VARA) registry participants routinely collects disease activity measures including the 28 joint disease activity score (DAS28) at each site and enters these data into a VARA database. Each site uses independent methods for documenting clinical notes and placing these data into the VARA database using manual extraction (ManEx). The purpose of this work was to develop NLP code to automatically identify relevant notes and extract clinical measures of RA disease activity and determined the accuracy of the NLP in comparison to the ManEx system at three VARA sties.
Methods: All clinical notes for VARA enrollees at three VARA sites between January 1, 2015 and September 30, 2015 that contained one clinical component of the DAS 28 – tender joint count (TJC), swollen joint count (SJC), or patient global assessment (PtGA) identified by either ManEx in the VARA database or in an NLP note – were evaluated. Any ESR within two weeks before and after the clinic visit was identified and the value closest to the clinic visit combined with TJC, SJC, and PtGA to calculate DAS28. For each event/note the JTC, JSC, PtGA, and ESR was evaluated and classified as follows: correct by NLP and ManEx, correct by ManEx only, correct by NLP only, or missing data by both methods. During the same observation period, observations that allowed calculations of DAS28 (all four elements collected) were also identified. Any discrepancies between ManEx and NLP were resolved by investigator review of the clinic notes.
Results: There were 1273 notes identified on 474 patients at the three VARA sites with the percent of DAS28 elements identified by the two methods as noted in the table. Reasons for the ManEx and NLP not identifying clinical elements varied by the specific element detected but generally fell into the following categories. Reason for ManEx or NLP failure may have occurred more than once for each note. Errors for ManEx were note not identified for data extraction (basically missed by reviewer doing ManEx) (N=133, 10.4 %), and data entry errors by the ManEx reviewer (N=170, 13.6%). Reasons for NLP failure were: wrong template selected for rheumatology note (N=91, 7.1%), modified template during clinic visit(N=6, 0.4%), prose instead of numeric values entered into the template (N=5, 0.4%), and no note in the VA corporate data warehouse (electronic note source) (N=72, 5.7%) .
Conclusion: This NLP system can extract DAS28 from notes from three distinct VARA sites to aid in clinical care and research activities. Future efforts will emphasize the standardization of data collection to better support using NLP methods for more efficient and reliable collection of clinical outcomes in RA and the dissemination and evaluation of the methodology at other sites using similar electronic medical record systems.
To cite this abstract in AMA style:
Cannon GW, Mehortra S, South B, Mikuls TR, Reimold AM, Sauer BC. A Natural Language Processing System Can Capture Rheumatoid Arthritis Disease Activity Measures in US Veterans Across Multiple Sites [abstract]. Arthritis Rheumatol. 2016; 68 (suppl 10). https://acrabstracts.org/abstract/a-natural-language-processing-system-can-capture-rheumatoid-arthritis-disease-activity-measures-in-us-veterans-across-multiple-sites/. Accessed .« Back to 2016 ACR/ARHP Annual Meeting
ACR Meeting Abstracts - https://acrabstracts.org/abstract/a-natural-language-processing-system-can-capture-rheumatoid-arthritis-disease-activity-measures-in-us-veterans-across-multiple-sites/