EHRs Can Be a Valuable Tool for Modeling Disease Activity in Multiple Sclerosis

Electronic data can be used to create "virtual cohorts" to rapidly implement studies of MS disease activity, comorbidities, pharmacogenomics, and presymptomatic disease.

Multiple sclerosis (MS) clinical research studies based on traditional patient cohorts are limited by their cost and size. This is has become especially acute with the significant number of recently approved disease-modifying therapies (DMT) and drugs still in clinical development for MS and the ensuing competition for resources.

Zongqi Xia, MD, PhD, Associate Neurologist, Brigham and Women's Hospital and Instructor in Neurology, Harvard Medical School, and colleagues gave a poster presentation at the American Academy of Neurology (AAN) 2013 Annual Meeting in the Multiple Sclerosis: Cost and Impact of MS Care session.

Electronic health records (EHR) contain a wealth of data that can be harnessed to improve patient care. Xia’s objectives were to develop EHR-based algorithms that are able to accurately classify multiple sclerosis (MS) patients and to exploit the methodology, once validated, to put together EHR-derived virtual cohorts to complement traditional cohorts in clinical research.

A large body of information was available, described by Xia as a “data-mart,” containing the complete medical records of 22,610 patients with at least one MS-related ICD9 code registered at Partners HealthCare, in Boston. Of these, 595 patients were randomly selected for “training” purposes. By this, Xia meant that the data from these patients was to be used to develop the methodology for creating EHR-derived cohorts for clinical trials.

Two neurologists independently reviewed the records of these training patients and found 251 confirmed MS cases. From codified and narrative EHR data on symptoms, tests, neuro-imaging (MRI) reports, and medications, extracted using natural language processing, Xia and his group performed 20-fold cross-validation using logistic regression on the training set to select informative variables for predicting MS. The algorithm was then applied to the data-mart to identify patients with a high probability of MS. Additional algorithms were developed in a similar fashion to model outcomes of disease activity using a subset of patients enrolled in Partners MS Center and to calculate the variance (R2) based on the models.

Xia’s team developed a robust EHR-based algorithm to classify MS. Setting the specificity at 95%, the algorithm has an area under the curve = 0.958, sensitivity = 83%, positive predictive value = 92%, and negative predictive value = 89%. They captured 5,495 MS patients from the EHR, including 1,153 patients enrolled in the Partners MS Center. Using this patient subset, with available disease activity measures as the gold-standard, they developed algorithms based on EHR variables to impute brain volume (R2=0.43) and MS severity score (R2=0.34). These are both clinically-relevant parameters for multiple sclerosis evaluation.

Xia suggests that the incorporation of sophisticated structured and free-text EHR data allows for improved identification of MS patients. EHR-derived cohorts, when linked with biobanks of discarded biological material, provide novel resources to rapidly implement studies of MS disease activity, comorbidities, pharmacogenomics, and presymptomatic disease. If they can be replicated, the novel informatics methods developed by Xia’s group hold out the promise of efficient and cost-effective development of multi-center cohorts for translational and clinical research in MS.