ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • ACR Convergence 2020
    • 2020 ACR/ARP PRSYM
    • 2019 ACR/ARP Annual Meeting
    • 2018-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 2804

Identifying Lupus Patients in Electronic Health Records: Development and Validation of Machine Learning Algorithms and Application of Rule-Based Algorithms

April Jorge1, Victor M. Castro2, April Barnado3, Vivian Gainer2, Chuan Hong4, Tianxi Cai2, Robert Carroll5, Leslie Crofford3, Karen Costenbader6, Katherine P. Liao7, Elizabeth Karlson6 and Candace H. Feldman6, 1Rheumatology, Allergy, and Immunology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 2Research Information Systems and Computing, Partners Healthcare, Boston, MA, 3Division of Rheumatology and Immunology, Vanderbilt University Medical Center, Nashville, TN, 4Harvard T.H. Chan School of Public Health, Boston, MA, 5Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 6Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA, 7Brigham and Women's Hospital, Boston, MA

Meeting: 2018 ACR/ARHP Annual Meeting

Keywords: Bioinformatics, diagnosis, Epidemiologic methods and systemic lupus erythematosus (SLE)

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Session Information

Date: Tuesday, October 23, 2018

Title: 5T086 ACR Abstract: Epidemiology & Pub Health III: SLE & SSc, Big Data & Large Cohorts (2802–2807)

Session Type: ACR Concurrent Abstract Session

Session Time: 2:30PM-4:00PM

Background/Purpose: To utilize electronic health records (EHR) to study SLE, phenotypic algorithms are needed to accurately identify these patients. We aimed to generate an EHR algorithm for SLE using machine learning, which allows the data to inform algorithmic features, with the primary goal of optimizing the positive predictive value (PPV). We also aimed to compare this algorithm with the performance of published rule-based algorithms (Barnado et al. Arthritis Care Res 2017) that pre-specify combinations of ICD-9 codes, medications and laboratory tests in our EHR.

 

Methods: We randomly selected 400 subjects with ≥1 SLE ICD-9 code (710.0) from a large, academic medical system EHR, and two rheumatologists identified gold standard cases of definite and probable SLE. Subjects meeting 1997 ACR or 2012 SLICC Classification Criteria for SLE were classified as definite SLE; those with partial, usually 3 criteria, considered to have likely SLE by the treating rheumatologist and reviewers were defined as probable SLE. We divided subjects into a training set (N=200) and validation set (N=200). We extracted codified and narrative concepts using natural language processing (NLP) from the training set and generated algorithms using penalized logistic regression (LASSO) to classify subjects with definite or definite/probable SLE. Algorithms were applied to the validation set using the original case definition and validated externally at the institution where the rule-based algorithms were developed (N=175) using a more liberal definition of specialist-reported SLE diagnosis. We also applied published rule-based algorithms to our training set to assess portability.

 

Results: In the combined training and validation cohorts (N=200 each), 29% had definite SLE and 41% had definite/probable SLE. Using machine learning methods, our codified data algorithm had a PPV of 90% for definite SLE at 97% specificity and 64% sensitivity (Table 1). For definite/probable SLE, the PPV was 92% at 97% specificity and 47% sensitivity. Models with NLP data performed similarly. In the external cohort validation, the codified definite/probable SLE algorithm had 95% PPV, 98% specificity, and 13% sensitivity. The PPVs of rule-based algorithms were <50% for definite SLE and ≤65% for definite/probable SLE in our EHR (Table 1).

 

Conclusion: Our final machine learning SLE phenotype algorithms performed well in our EHR and had high PPV but lower sensitivity when externally validated in a cohort that did not require ACR/SLICC criteria to define cases. Rule-based SLE phenotype algorithms did not perform as well in our EHR likely because of these differences in case definitions and variations in clinical practice, medication use, laboratory tests, billing and documentation across EHRs. Unique EHR characteristics, case definitions, and research goals must be considered when applying algorithms to identify SLE patients in EHRs.

 

Table 1: Algorithm performance characteristics

 

Definite SLE*

Definite/Probable SLE

Sensitivity (%)

Specificity (%)

PPV (%)

Sensitivity (%)

Specificity (%)

PPV (%)

Machine-learning codified algorithms**

64

97

90

47

97

92

Machine learning codified/ natural language processing algorithms

46

97

87

41

97

90

Top-performing rule-based algorithm 1***

>3 ICD-9 codes for SLE, ANA ≥1:40, ever DMARD use, and ever steroid use

58

72

47

53

75

63

Top-performing rule-based algorithm 2***

>3 ICD-9 codes for SLE and ever antimalarial use

86

60

46

84

69

65

* In the definite SLE algorithms, probable cases were considered non-SLE.

**Definite SLE algorithm includes the coded variables chronic renal failure, rheumatoid arthritis, sicca syndrome, SLE, unspecified connective tissue disease, anti-dsDNA laboratory test, complement laboratory test, and anti-TNF/biologic DMARDs (etanercept, adalimumab, infliximab, abatacept, tofacitinib, tocilizumab, certolizumab, golimumab, secukinumab, and ustekinumab). Definite/probable SLE algorithm includes the coded variables chronic renal failure, SLE, anti-dsDNA, complement, and antimalarial medication.

***Barnado, A. et al, Arthritis Care Res, 2017

 


Disclosure: A. Jorge, None; V. M. Castro, None; A. Barnado, None; V. Gainer, None; C. Hong, None; T. Cai, None; R. Carroll, None; L. Crofford, None; K. Costenbader, None; K. P. Liao, None; E. Karlson, None; C. H. Feldman, None.

To cite this abstract in AMA style:

Jorge A, Castro VM, Barnado A, Gainer V, Hong C, Cai T, Carroll R, Crofford L, Costenbader K, Liao KP, Karlson E, Feldman CH. Identifying Lupus Patients in Electronic Health Records: Development and Validation of Machine Learning Algorithms and Application of Rule-Based Algorithms [abstract]. Arthritis Rheumatol. 2018; 70 (suppl 9). https://acrabstracts.org/abstract/identifying-lupus-patients-in-electronic-health-records-development-and-validation-of-machine-learning-algorithms-and-application-of-rule-based-algorithms/. Accessed .
  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print

« Back to 2018 ACR/ARHP Annual Meeting

ACR Meeting Abstracts - https://acrabstracts.org/abstract/identifying-lupus-patients-in-electronic-health-records-development-and-validation-of-machine-learning-algorithms-and-application-of-rule-based-algorithms/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM ET on November 14, 2024. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology