Session Information
Session Type: Poster Session A
Session Time: 10:30AM-12:30PM
Background/Purpose: Systemic sclerosis (SSc) is a rare systemic autoimmune rheumatic disease. International Classification of Diseases (ICD) code counts (for example, using ≥ 2 ICD-10 codes of M34 to identify SSc), phenotype risk score (PhRS), and linear combination of principal components (LPC) are methods used for phenotype identification in electronic health records (EHR). PhRS is a weighted aggregation of relevant clinical features based on ICD codes and has been found to have high performance in identifying systemic lupus erythematosus (PMID: 37096581). LPC leverages principal components for noise reduction and outperforms PhRS in identifying certain common illnesses, such as coronary artery disease and chronic kidney disease (PMID: 34302027). We aim to evaluate the performance of LPC, PhRS, and ICD code counts in identifying rheumatologist-diagnosed SSc from EHR.
Methods: We identified patients with potential SSc who had at least one ICD-10 code of M34 from 2/2020 to 9/2023. These records were manually reviewed to identify those who had rheumatologist-diagnosed SSc. We also randomly sampled controls in a 1:5 ratio, who did not have M34 codes and were matched by the number of medical encounters. We compiled a list of ICD-9 and ICD-10 codes corresponding to the clinical features of SSc (Table 1). The SSc PhRS was calculated as the sum of these ICD codes weighted by the log inverse prevalence of the code in the entire EHR. To develop LPC scores, we first performed principal component analysis (PCA) on the above-mentioned ICD codes in the entire EHR dataset. The SSc LPC score is the sum of PCs for each individual weighted by the corresponding eigenvalues. We used Tracy-Widom test to select the number of significant eigenvalues to include when calculating the LPC score. We compared PhRS and LPC in distinguishing rheumatologist-diagnosed SSc from matched controls, and PhRS, LPC, and ICD code counts in distinguishing SSc from other conditions in patients with at least one M34 ICD-10 code. Performance was assessed using area under receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC).
Results: We identified 653 patients with at least one ICD-10 code of M34. After manual chart review, 550 (84%) had rheumatologist-diagnosed SSc, 78 (12%) did not have SSc, and 25 (4%) had self-reported SSc or were undergoing evaluation for a suspected SSc diagnosis. LPC was superior to PhRS in identifying rheumatologist-diagnosed SSc from matched controls (AUROC 0.90 vs 0.87; AUPRC 0.85 vs 0.78; Figures 1A and 1B). LPC was superior to both ICD code counts and PhRS in identifying rheumatologist-diagnosed SSc from other conditions in patients with at least one ICD-10 code of M34 (AUROC 0.72 for LPC, 0.70 for ICD code counts and 0.70 for PhRS; AUPRC 0.93 for LPC, 0.92 for ICD code counts and 0.91 for PhRS; Figures 1C and 1D).
Conclusion: LPC outperforms PhRS and ICD code counts in identifying rheumatologist-diagnosed SSc from EHR, despite the overall small differences among these methods. Our study serves as a proof-of-concept that leveraging both clinical features and noise-reduction techniques is a promising approach to identifying systemic autoimmune rheumatic diseases from EHR.
To cite this abstract in AMA style:
Luo Y, Zhang G, Weng C, Bernstein E. Linear Combination of Principal Components Achieves Top Performance in Identifying Rheumatologist-Diagnosed Systemic Sclerosis from Electronic Health Records [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/linear-combination-of-principal-components-achieves-top-performance-in-identifying-rheumatologist-diagnosed-systemic-sclerosis-from-electronic-health-records/. Accessed .« Back to ACR Convergence 2024
ACR Meeting Abstracts - https://acrabstracts.org/abstract/linear-combination-of-principal-components-achieves-top-performance-in-identifying-rheumatologist-diagnosed-systemic-sclerosis-from-electronic-health-records/