Data Mining Uncovers Undetected Diabetes

March 15, 2016

Article

When it comes to diabetes screening, big data may uncover more cases. A California team came up with a new algorithm that assesses risk.

When it comes to diabetes screening, big data may uncover more cases.

A team of researchers has developed an algorithm at UCLA’s Semel Institute for Neuroscience and Human Behavior mined the electronic health records of thousands people to more accurately and efficiently identify cases of undiagnosed diabetes.

What they found was a surprise.

By examining the records in their entirety, they uncovered significantly more cases and uncovered several previously unknown, and unexpected, risk factors.

The team, led by Ariana Anderson, PhD, assistant research professor and statistician, conducted a cross-sectional, retrospective study 9,948 people from hospitals, clinics and doctor's offices in all 50 states. Although the records were stripped of personally identifiable information, they contained vital signs, prescription medications and reported ailments, categorized according to the International Classification of Diseases diagnostic codes (ICD9).

Using data from half of the patients, they refined an algorithm designed predict the likelihood of an individual having diabetes. They did this by comparing the outcomes of three models. The first used commonly prescribed medications, reported diagnoses, and conventional diabetes risk factors such as age, sex, smoking status, BMI, and blood pressure. The second excluded just the medications, and the third used only the conventional risk factors. They then tested the new tool on data from the second half of the cohort. The study appeared in the February 16 issue of the Journal of Biomedical Informatics.

The results showed that using the entire record, not just the usual predictors. This tool proved 2.5 percent better at identifying people with diabetes than the standard approach, and 14 percent better at identifying those who do not have it. This information can help identify people who should undergo testing.

"With widespread implementation, these discoveries have the potential to dramatically decrease the number of undetected cases of Type 2 diabetes, prevent complications from the disease and save lives," said Anderson.

According to the Centers for Disease Control, nearly 30 million people in the U.S. have diabetes and about one quarter don’t know they have it. Another 86 million-more than one third of adults-have prediabetes.

The study uncovered another unexpected result. Several disorders and illnesses, including sexual and gender identity disorders, intestinal infections, and chlamydic sexually transmitted diseases are also a notable risk factor for developing diabetes. Sexual and gender disorders increased the risk for type 2 diabetes by 130 percent, about the same as high blood pressure. Viral STDs such as chlamydia raised risk by 88% and intestinal infections such as colitis and gastroenteritis raised risk by 83 percent.

People taking anxiety medications have one less thing to worry about — these and anti-seizure medications were associated with a lower risk, as was being prone to migraines. The team has applied similar techniques to predict diseases, including epilepsy and irritable bowel syndrome.

"The overall message is that ordinary record keeping that doctors do is a very, very rich source of information," said Mark Cohen, a Semel Institute professor in residence. "If you use a computerized approach to studying patterns in that data, you can greatly improve diagnosis and medical care,"

The team estimated that the new method could potentially diagnose an additional 400,000 people.

It should be noted that the findings are not fine-grained enough to tell precisely which conditions are linked to diabetes, since they were based largely on ICD9 codes, which can each encompass a number of conditions.