Session Information
Session Type: Poster Session B
Session Time: 10:30AM-12:30PM
Background/Purpose: Large data-driven medical research is invaluable in answering questions about epidemiology, genetics, therapeutics, and outcomes of rare diseases. Systemic sclerosis (SSc) is rare yet the deadliest autoimmune disease with limited treatment options available to alter the disease course. Despite the availability of large administrative datasets, there are only a few studies on SSc using these datasets, most of which define SSc cases based on one diagnostic code. Optimal use of these datasets, however, requires accurate SSc case determination, which has yet to be studied. We aimed to evaluate algorithms to identify SSc cases in an administrative dataset and to describe the performance characteristics of these algorithms.
Methods: Patient records in a large, global multicenter electronic medical record dataset (TriNetX Research Network) were screened for SSc using the International Classification of Diseases, Ninth and Tenth Revisions (ICD-9 and ICD-10). The patient identifiable data were available only from one center. A medical record review was performed from a simple random sample of this cohort using a standardized data abstraction form (GO) to confirm the SSc diagnosis. All outpatient and inpatient ICD codes and dates assigned were extracted from the TriNeTx Research Network. Using data for confirmed cases, the performance of different administrative algorithms was assessed by sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and receiver operator curve analyses.
Results: To date, medical records for a random sample of 480 of the 1126 patients with at least one ICD9/10 code for SSc were reviewed. Overall, 342 of the 480 (71.2%) patients were confirmed to have SSc, predominantly limited cutaneous SSc (Table 1). Among the tested administrative algorithms, “≥ 2 outpatient ICD9/10 codes at least 30 days apart without an alternative diagnosis code (algorithm 4)” showed the best performance with a sensitivity of 95.6%, specificity of 70.3%, PPV of 89.8%, NPV of 86.6% and area under the curve of 0.906 (Table 2). Although ≥1 inpatient ICD9/10 code had a high specificity (90.6%) and PPV (92.1%), its sensitivity (44.4%) and NPV (39.7%) were low. When “≥1 inpatient ICD9/10 code” was added to the algorithm 4, the performance of the algorithm did not change significantly (Figure). When patients with overlap syndromes were excluded, algorithm 4 still performed the best (sensitivity 96.8%, specificity 70.6%, PPV 85.9%, NPV 92.3%).
Conclusion: The administrative algorithm (“≥ 2 outpatient ICD9/10 code at least 30 days apart without an alternative diagnosis code) reliably identifies SSc cases. This algorithm can be applied to large administrative datasets to conduct clinical and epidemiologic studies with a higher validity in SSc patients. External validation of this algorithm in different datasets can further strengthen its utility.
Table 1. Characteristics of the patient cohort by confirmed SSc diagnosis by medical record review
Table 2. Performance of algorithms with various diagnostic codes for SSc case identification
Figure. Receiver operator curves of the algorithms against the medical record review
To cite this abstract in AMA style:
Ozen G, O'Rorke M, Romitti P, Domsic R. Refining Administrative Algorithms For Accurate Identification of Patients with Systemic Sclerosis In Trinetx Research Network [abstract]. Arthritis Rheumatol. 2025; 77 (suppl 9). https://acrabstracts.org/abstract/refining-administrative-algorithms-for-accurate-identification-of-patients-with-systemic-sclerosis-in-trinetx-research-network/. Accessed .« Back to ACR Convergence 2025
ACR Meeting Abstracts - https://acrabstracts.org/abstract/refining-administrative-algorithms-for-accurate-identification-of-patients-with-systemic-sclerosis-in-trinetx-research-network/