ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • ACR Convergence 2020
    • 2020 ACR/ARP PRSYM
    • 2019 ACR/ARP Annual Meeting
    • 2018-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 2074

Automated Diagnosis Extraction from Electronic Medical Records with Machine Learning Classifiers

Tjardo Maarseveen1, Thomas Huizinga 1, Marcel J.T. Reinders 1, Erik van den Akker 1 and Rachel Knevel 1, 1Leiden University Medical Center, Leiden, Netherlands

Meeting: 2019 ACR/ARP Annual Meeting

Keywords: big data and data analysis, Bioinformatics, diagnosis, Electronic Health Record

  • Tweet
  • Email
  • Print
Session Information

Date: Tuesday, November 12, 2019

Title: Epidemiology & Public Health Poster III: OA, Gout, & Other Diseases

Session Type: Poster Session (Tuesday)

Session Time: 9:00AM-11:00AM

Background/Purpose: The use of Electronic Medical Records (EMR) for research purposes has led to an increasing interest in Natural Language Processing (NLP) for text classification. Little preparation is required as the NLP-methods are capable of automatically interpreting the data. Classification is often accomplished with naïve word-matching. An alternative to word-matching is the usage of a Machine Learning (ML) classifier. Rather than providing a set of patterns, an ML-model only requires the outcome and then formulates the patterns itself. The purpose of this study is to build a reliable classifier with machine learning techniques that can identify the Rheumatoid Arthritis (RA) cases based on the provided EMR entry.

Methods: Data was acquired from the HiX-EMR database consisting of 2,771 patients that visited the rheumatology outpatient clinic from the Leiden University Medical Centre between 2007 and 2018. This database featured a total of 38,216 entries. The first entry (if available) was selected per patient for annotation, resulting in a total of 1,361 entries. The annotated sample was then randomly split into an equally sized training and test set. Both sets were preprocessed and then classified with the following methods: naïve word-matching, Naive Bayes (NB), Decision Tree, Gradient Boosting (GB), Neural Networks and Support Vector Machines (SVM), see table 1 for more information. Default Scikit-learn implementations2 were used to create the models, except for the word-matching model of which the classification is based on the presence of RA-defining strings like ‘Reumatoide Artritis’.

Finally, the performance of the models was evaluated with a receiver operating characteristic (ROC) curve analysis via the pROC package3. The Delong test was used to assess the 95% confidence intervals (CI) and to determine the difference between the performance of the word-matching method and the ML-models.

Results: The naïve word-matching approach resulted in a high area under the curve (AUC=0.76). Likewise, the ML-models resulted in relatively high AUC-scores as well: NB=0.83, SVM=0.91, Neural Networks=0.92 and the GB-method with a 0.94. The Decision Tree showed the worst performance with an AUC-ROC of only 0.51. In comparison to the naïve word-matching ROC-curve, all the ML-models showed a significant difference: Decision Tree (p< 2.2e-16; CI=0.49-0.56), NB (p= 4.4e-3; CI=0.80-0.86), Neural Networks (p< 2.2e-16; CI=0.90-0.94), GB (p< 2.2e-16; CI=0.92-0.96) and the SVM (p=4.0e-16; CI=0.89-0.93).

Conclusion: The Gradient Boosting, Neural Networks, SVM and Naïve Bayes models all showcased a significantly better performance than the Naïve word-matching algorithm, which establishes these ML-methods as an efficient approach for data extraction from EMR.


Disclosure: T. Maarseveen, None; T. Huizinga, Abblynx, 2, 5, 8, Abbott, 2, 5, 8, Biotest AG, 2, 5, 8, Boehringer Ingelheim, 2, 5, 8, Boeringher Ingelheim, 2, 5, 8, Bristol-Myers Squibb, 2, 5, 8, Crescendo Bioscience, 2, 5, 8, Eli Lilly, 2, 5, 8, Epirus, 2, 5, 8, Galapagos, 2, 5, 8, Janssen, 2, 5, 8, Merck, 2, 5, 8, Novartis, 2, 5, 8, Nycomed, 2, 5, 8, Pfizer, 2, 5, 8, Roche, 2, 5, 8, Sanofi, 2, 5, Sanofi-Aventis, 2, 5, 8, Takeda, 2, 5, 8, UCB, 2, 5, 8, Zydus, 2, 5, 8; M. Reinders, None; E. van den Akker, None; R. Knevel, None.

To cite this abstract in AMA style:

Maarseveen T, Huizinga T, Reinders M, van den Akker E, Knevel R. Automated Diagnosis Extraction from Electronic Medical Records with Machine Learning Classifiers [abstract]. Arthritis Rheumatol. 2019; 71 (suppl 10). https://acrabstracts.org/abstract/automated-diagnosis-extraction-from-electronic-medical-records-with-machine-learning-classifiers/. Accessed .
  • Tweet
  • Email
  • Print

« Back to 2019 ACR/ARP Annual Meeting

ACR Meeting Abstracts - https://acrabstracts.org/abstract/automated-diagnosis-extraction-from-electronic-medical-records-with-machine-learning-classifiers/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM ET on November 14, 2024. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology