ACR Meeting Abstracts

ACR Meeting Abstracts

  • Meetings
    • ACR Convergence 2024
    • ACR Convergence 2023
    • 2023 ACR/ARP PRSYM
    • ACR Convergence 2022
    • ACR Convergence 2021
    • ACR Convergence 2020
    • 2020 ACR/ARP PRSYM
    • 2019 ACR/ARP Annual Meeting
    • 2018-2009 Meetings
    • Download Abstracts
  • Keyword Index
  • Advanced Search
  • Your Favorites
    • Favorites
    • Login
    • View and print all favorites
    • Clear all your favorites
  • ACR Meetings

Abstract Number: 200

A Validated Text-mining Algorithm to Extract Rheumatoid Arthritis Medication Contained in Format-free Fields of Electronic Medical Records

Tjardo Maarseveen1, Thomas Huizinga 1, Marcel J.T. Reinders 1, Erik van den Akker 1 and Rachel Knevel 1, 1Leiden University Medical Center, Leiden, Netherlands

Meeting: 2019 ACR/ARP Annual Meeting

Keywords: big data and data analysis, Bioinformatics, drug treatment, Electronic Health Record

  • Tweet
  • Email
  • Print
Session Information

Date: Sunday, November 10, 2019

Title: Epidemiology & Public Health Poster I: RA

Session Type: Poster Session (Sunday)

Session Time: 9:00AM-11:00AM

Background/Purpose: Rapidly expanding collections of Electronic Medical Records (EMR) form a valuable resource for clinical research. Besides entries with a standardized format, EMRs often also contain free text fields intended for noting specifications of the treatment policy. While these free text fields contain essential information, their free nature makes them hard to parse, as they contain typos or acronyms. As a result, data extraction from EMR is often performed manually, or is performed while excluding the format-free fields. The purpose of this study is to develop and validate a text-mining approach to extract medication prescribed for Rheumatoid Arthritis as contained in format-free fields of an EMR.

Methods: The EMR dataset consisted of 45,012 entries from 2,771 patients that visited the rheumatology outpatient clinic from the Leiden University Medical Centre between 2007 and 2018. We randomly selected 15% and 7,5% of the entries to create a training and test set, with 5,992- and 2,993 entries respectively. The training set was used to design the algorithm, whereas the test set was used as an independent validation of the algorithm’s performance of identifying each of the DMARDs and biologicals routinely prescribed for treating RA.

Using methods derived from Natural Language Processing, we developed an algorithm that consecutively performs three tasks: 1. Text pre-formatting 2. Acronym recognition and 3. Typo correction. Text pre-formatting consisted of several simple operations to deal with the most prevalent textual artifacts, including separation of special characters and punctuation sticking to words. Ten independent clinicians compiled acronym lists for each of the routinely prescribed RA medication. Lastly, for typo correction, we employed the Damerau-Levenshtein1 (DL) distance that determines the similarity between two words by counting the number of single character operations (remove, add, move or replace) required to transform one word into another. Using the training set, we computed for each drug DL distances between all words in the free fields of the EMRs and a particular drug name or its acronym. Using the annotations created in the training set we then determined the DL distance optimally distinguishing between a typo and two similar words with a different meaning.

Results: Fifteen medications for the treatment of RA were present in our sample (see figure 1). In total, medication was present in 1,789 out of the 2,993 entries. The median DL cutoff for typos was 2 with a standard deviation of 0.96. The overall accuracy of our drug-identification-algorithm was very good per medication in general (0.97) and the individual test characteristics were high: sensitivity=0.98 and specificity=0.95, PPV=0.98, NPV=0.95. Also on an individual drug-level the performance was high: accuracy >=0.99, sensitivity >=0.89 and specificity >=0.99, NPV >=0.99 and PPV >=0.90 for all medication except golimumab.

Conclusion: We developed and validated an algorithm enabling a highly accurate automated extraction of RA medication from format-free fields of Electronic Medical Records.


Disclosure: T. Maarseveen, None; T. Huizinga, Abblynx, 2, 5, 8, Abbott, 2, 5, 8, Biotest AG, 2, 5, 8, Boehringer Ingelheim, 2, 5, 8, Boeringher Ingelheim, 2, 5, 8, Bristol-Myers Squibb, 2, 5, 8, Crescendo Bioscience, 2, 5, 8, Eli Lilly, 2, 5, 8, Epirus, 2, 5, 8, Galapagos, 2, 5, 8, Janssen, 2, 5, 8, Merck, 2, 5, 8, Novartis, 2, 5, 8, Nycomed, 2, 5, 8, Pfizer, 2, 5, 8, Roche, 2, 5, 8, Sanofi, 2, 5, Sanofi-Aventis, 2, 5, 8, Takeda, 2, 5, 8, UCB, 2, 5, 8, Zydus, 2, 5, 8; M. Reinders, None; E. van den Akker, None; R. Knevel, None.

To cite this abstract in AMA style:

Maarseveen T, Huizinga T, Reinders M, van den Akker E, Knevel R. A Validated Text-mining Algorithm to Extract Rheumatoid Arthritis Medication Contained in Format-free Fields of Electronic Medical Records [abstract]. Arthritis Rheumatol. 2019; 71 (suppl 10). https://acrabstracts.org/abstract/a-validated-text-mining-algorithm-to-extract-rheumatoid-arthritis-medication-contained-in-format-free-fields-of-electronic-medical-records/. Accessed .
  • Tweet
  • Email
  • Print

« Back to 2019 ACR/ARP Annual Meeting

ACR Meeting Abstracts - https://acrabstracts.org/abstract/a-validated-text-mining-algorithm-to-extract-rheumatoid-arthritis-medication-contained-in-format-free-fields-of-electronic-medical-records/

Advanced Search

Your Favorites

You can save and print a list of your favorite abstracts during your browser session by clicking the “Favorite” button at the bottom of any abstract. View your favorites »

All abstracts accepted to ACR Convergence are under media embargo once the ACR has notified presenters of their abstract’s acceptance. They may be presented at other meetings or published as manuscripts after this time but should not be discussed in non-scholarly venues or outlets. The following embargo policies are strictly enforced by the ACR.

Accepted abstracts are made available to the public online in advance of the meeting and are published in a special online supplement of our scientific journal, Arthritis & Rheumatology. Information contained in those abstracts may not be released until the abstracts appear online. In an exception to the media embargo, academic institutions, private organizations, and companies with products whose value may be influenced by information contained in an abstract may issue a press release to coincide with the availability of an ACR abstract on the ACR website. However, the ACR continues to require that information that goes beyond that contained in the abstract (e.g., discussion of the abstract done as part of editorial news coverage) is under media embargo until 10:00 AM ET on November 14, 2024. Journalists with access to embargoed information cannot release articles or editorial news coverage before this time. Editorial news coverage is considered original articles/videos developed by employed journalists to report facts, commentary, and subject matter expert quotes in a narrative form using a variety of sources (e.g., research, announcements, press releases, events, etc.).

Violation of this policy may result in the abstract being withdrawn from the meeting and other measures deemed appropriate. Authors are responsible for notifying colleagues, institutions, communications firms, and all other stakeholders related to the development or promotion of the abstract about this policy. If you have questions about the ACR abstract embargo policy, please contact ACR abstracts staff at [email protected].

Wiley

  • Online Journal
  • Privacy Policy
  • Permissions Policies
  • Cookie Preferences

© Copyright 2025 American College of Rheumatology