Session Information
Date: Sunday, November 8, 2015
Session Type: ACR Poster Session A
Session Time: 9:00AM-11:00AM
Background/Purpose: Current methods for identifying people with axial spondyloarthritis (AxSpA) in large datasets are inadequate because billing codes for most types of spondyloarthrtis (SpA) do not indicate the presence or absence of axial involvement and nomenclature for AxSpA is varied and evolving. This has substantially limited observational research of AxSpA and AxSpA subtypes. The objective of this study was to develop methods for identifying AxSpA in national Veteran Health Administration (VHA) datasets
Methods: Algorithms for identifying veterans with AxSpA were designed to include combinations of SpA features and billing codes (Figure 1). Terms that represent SpA features were identified in clinical documents with natural language processing (NLP). Methods were developed to test and refine the algorithms. Data and computing resources included the Corporate Data Warehouse, Decision Support System, and the Veteran Affairs Informatics and Computing Infrastructure (VINCI).
Results: Terms representing SpA features were explored, identified, selected, extracted, and annotated for the development of NLP modules, using methods and technologies shown in Table 1, Step 1. Methods and software were also designed to build reference populations, identify veterans fulfilling the algorithms, and test the algorithms (Table 1, Steps 2-4). The accuracy of NLP modules exceeded the target accuracy of 90% (Table 2).
Conclusion: The methods for identifying terms representing SpA features in clinical documents are feasible, and SpA feature terms have been identified with high accuracy. Further work is required to apply, test, and refine the algorithms in reference populations with and without AxSpA.
Figure 1. Algorithms for identifying axial spondyloarthritis
Table 1. Methods for identifying axial spondyloarthritis |
|
|
Steps |
Software |
|
1 |
Identify terms in clinical documents that represent SpA features. For each term: |
|
1a |
Explore term variations (alternative wording, misspellings, descriptions, etc.) in randomly sampled clinical documents – Identify root words for each variation (includes word fragments with wild cards[*]) – Select root words that represent the intended term in ≥40% of reviewed documents |
Voogo |
1b |
Identify term variations in VA documents – Query root words in VA datasets to identify all term variations mentioned in all documents – Determine the number of times each variation was mentioned in all documents in the database |
SQL |
1c |
Select common and meaningful term variations – Exclude rarely used variations – Exclude variations that don’t represent the intended term (not meaningful) |
Excel |
1d |
Extract sections of text containing the selected term variation (snippets) from all documents |
Snapshot |
1e |
Annotate randomly selected snippets – Identify the parts of text necessary to determine if the extracted term represents the intended SpA term – Classify the snippet text according to whether or not it represents the intended term (yes/no/possible) – Develop & revise annotations guidelines – Train annotators until inter-rater agreement is >90% – Annotate 1500 snippets for NLP |
Visual Tagging Tool (VTT), eHOST |
1f |
Develop NLP module – Develop sets of rules (machine learning) that train NLP software how to classify terms in the context of the surrounding text – Test and revise NLP modules with additional annotated snippets until accuracy is >90% for each term |
Support Vector Machines (SVM), RED |
1g | Classify patients with discordant snippet classifications – Develop and apply rules for classifying patients with snippets assigned to different categories (yes & no) | SQL |
2 | Develop reference population of veterans with and without AxSpA | |
2a |
Develop cohort of 2500 randomly selected veterans – Enrich cohort by selecting veterans with at least 2 rheumatology clinic encounters – Create tables with data relevant for determining AxSpA status for each veteran (rheumatology clinic notes, reports from articular radiographs, DMARD exposure, anti-CCP, RF, HLA-B27, etc.) – Import tables into Chart Reviewer software and set software parameters |
SQL ChartReview eHOST |
2b |
Classify veteran in rheumatology reference population – Develop classification guidelines – Determine inter-rater agreement between classifiers – Classify veterans in reference population as AxSpA or no AxSpA |
SQL ChartReview eHOST |
3 | Identify veterans fulfilling algorithms | |
Sequentially apply NLP modules and coded ICD-9 data to: – Rheumatology reference population – General veteran population | SQL | |
4 |
Test & refine algorithm(s) |
|
4a |
Test & refine algorithm(s) in the rheumatology reference population – Calculate sensitivity, specificity, and accuracy of each algorithm – If algorithm accuracy is <85%, revise processes ± algorithms |
SQL |
4b |
Test and refine algorithm(s) in the general veteran population – Review charts of randomly selected veterans fulfilling algorithms & manually classify as AxSpA or no AxSpA – Calculate specificity of each algorithm using manual classification as reference – If algorithm specificity is <85%, revise processes ± algorithms |
SQL |
5 |
Alternative plan (if necessary) |
|
If performance of all algorithms is suboptimal, develop a model that will statistically identify the most predictive combination(s) of terms |
SQL |
Table 2. Identification of terms in clinical documents that represent SpA features | ||||||
Term |
# Root words with true positive rate >40% |
# Term variations found in VA documents |
# Meaningful variations in ≥100 documents |
# Extracted snippets |
Annotator IRR [Κappa (%)] |
NLP Accuracy (%) |
(Step 1a) |
(Step 1b) |
(Step 1c) |
(Step 1d) |
(Step 1e) |
(Step 1f) |
|
Sacroiliitis |
16 |
905 |
506 |
326,436 |
98.1 |
91.1 |
Spond* |
6 |
9593 |
134 |
802,757 |
94.8 |
93.5 |
HLA-B27+ |
1 |
299 |
3 |
774,140 |
93.3 |
97.2 |
Back pain |
34 |
4359 |
416 |
1,547,520 |
100 |
NA* |
*NLP unnecessary since extraction methods yielded 97% true positive classification of randomly sampled snippet |
To cite this abstract in AMA style:
Walsh J, Leng J, Breviu B, Clegg D, He T, Sauer B. Identification Methods for Axial Spondyloarthritis in American Veterans [abstract]. Arthritis Rheumatol. 2015; 67 (suppl 10). https://acrabstracts.org/abstract/identification-methods-for-axial-spondyloarthritis-in-american-veterans/. Accessed .« Back to 2015 ACR/ARHP Annual Meeting
ACR Meeting Abstracts - https://acrabstracts.org/abstract/identification-methods-for-axial-spondyloarthritis-in-american-veterans/