Session Information
Session Type: Poster Session C
Session Time: 1:00PM-3:00PM
Background/Purpose: Algorithms incorporating diagnostic and procedural codes have recently been developed to identify rheumatoid arthritis-associated interstitial lung disease (RA-ILD) in administrative and electronic health record (EHR) data sets for research and clinical purposes. In a single-center EHR, we previously incorporated ILD-related terms from chest computed tomography (CT) reports to improve the positive predictive value (PPV) of such algorithms (Luedders et al., Arthritis Rheumatol [abstract] 2021; 73(suppl 10)). We aimed to externally validate this approach in real-world data collected from multiple centers.
Methods: We selected participants within the multicenter Veterans Affairs Rheumatoid Arthritis registry to undergo record review using stratified subsampling to enrich the sample with RA-ILD. Record review was performed in a standardized fashion to determine ILD status (reference standard). Administrative algorithms incorporating varying levels of diagnostic and procedural codes collected from linked administrative data were applied to the cohort (Table 1). Chest CT reports were obtained from a national data warehouse, and ILD-related terms were identified in these reports using automatic regular expressions (a natural language processing [NLP] technique). We subsequently added the requirement of an ILD-related term in the CT report to the administrative algorithms, with exclusion of ILD-related terms with negative modifiers within 40 characters of the term. Terms were considered not to be present if a CT read was not available. Algorithm performance was assessed by calculating the PPV and sensitivity, accounting for the sampling process.
Results: We studied 536 RA patients (93% male, mean age 7th decade, 71% with available chest CT reports) from 12 centers, of which 203 had RA-ILD by the reference standard. The PPV of administrative algorithms alone improved with increasing algorithm requirements, ranging from 53.8% (algorithm 1) to 81.6% (algorithm 3) (Figure 1). Requiring only the presence of at least 1 ILD-related term from NLP of chest CT reports (Algorithm T) achieved a moderate sensitivity (75.2%) and PPV (63.8%). The addition of ILD-related terms improved the PPV of all administrative algorithms, with the greatest improvements occurring with algorithms that had fewer administrative data requirements (21.1% in algorithm 1 vs. 6.0% in algorithm 3). Combining administrative algorithms with stricter requirements and ILD-related terms from chest CT reports achieved the highest PPV (algorithm 4, 89.2%). Increases in PPV were accompanied by a decrease in sensitivity of a similar magnitude (range -3.9 to -19.5%).
Conclusion: The inclusion of ILD-related terms acquired from chest CT reports using NLP substantially improves the PPV of administrative-based RA-ILD algorithms, with accompanying decreases in sensitivity of a similar magnitude. These findings in real-world data collected from multiple centers externally validate prior work in a single-center EHR and support the application of these algorithms to identify RA-ILD patients for clinical and research purposes in various real-world data sources.
To cite this abstract in AMA style:
Luedders B, Roul P, Yang Y, Cope B, DeVries M, Campbell W, Hershberger D, Rojas J, Cannon G, Sauer B, Baker J, Curtis J, Mikuls T, England B. Natural Language Processing of Chest CT Reports as a Novel Method of Identifying RA-ILD in Real-World Data [abstract]. Arthritis Rheumatol. 2022; 74 (suppl 9). https://acrabstracts.org/abstract/natural-language-processing-of-chest-ct-reports-as-a-novel-method-of-identifying-ra-ild-in-real-world-data/. Accessed .« Back to ACR Convergence 2022
ACR Meeting Abstracts - https://acrabstracts.org/abstract/natural-language-processing-of-chest-ct-reports-as-a-novel-method-of-identifying-ra-ild-in-real-world-data/