A novel electronic health record (EHR)-based machine learning model accurately identified adults at high risk of developing type 2 diabetes up to a decade in advance, according to data presented at the 2026 Scientific Sessions of the American Diabetes Association (ADA) in New Orleans, Louisiana.
According to the US Centers for Disease Control and Prevention, more than 40 million people in the US have diabetes, and about 90% to 95% of them have type 2 diabetes. People with overweight, who are physically active less than 3 times a week, are 45 years of age or older, and have prediabetes are at an increased risk of developing type 2 diabetes.
Key Findings
- EHR-based machine learning model predicted type 2 diabetes risk at 1, 3, and 10 years in 3,365,464 adults at Kaiser Permanente Northern California
- Training AUC 0.886 (95% CI, 0.883–0.888); validation AUC 0.883 (95% CI, 0.88–0.886)
- At a high-risk threshold of >1.2%, sensitivity was 74% and specificity was 82% over up to 10 years of follow-up
- Authors plan prospective clinical testing to evaluate impact on prevention program enrollment and diabetes incidence
Current diabetes prevention programs cannot serve the more than 60% of US adults carrying type 2 diabetes risk factors. The gradual, often asymptomatic progression of the disease limits the ability of health systems to prioritize prevention resources toward the patients most likely to benefit. As such, the present model represents a potential step toward more precise, scalable risk stratification using data already available in the clinical record.
"These findings represent a potential advancement over existing approaches for identifying individuals at risk of developing type 2 diabetes by enabling earlier, more precise detection and supporting a more targeted, proactive approach to prevention," Luis Rodriguez, PhD, MPH, RD, lead author of the study, said in a statement.¹ "Our model has the potential to create an opportunity for clinicians and health systems to focus prevention efforts on the high-risk individuals often missed by traditional screening who have the most to gain from prevention and treatment."
The retrospective cohort study analyzed 3,365,464 adults aged 18 to 70 receiving care at Kaiser Permanente Northern California between 2012 and 2024. Median patient age was 39 years, and 55% were female.
Researchers applied a hazard-based super learning approach combining multiple survival-analysis models to generate individualized risk estimates at 1, 3, and 10 years. Input variables included routine clinical and demographic data collected at medical visits including age, weight, blood glucose levels, medical history, and medications, as well as publicly available social determinants data including access to healthy food and walkable areas.
During a median follow-up of 5.4 years, the observed type 2 diabetes incidence was 10.7 per 1,000 person-years.
The training model achieved an area under the curve (AUC) of 0.886 (95% CI, 0.883 to 0.888), with the validation model scoring 0.883 (95% CI, 0.88 to 0.886). 1-year calibration was near-ideal, with a mean predicted risk of 1.03% against an observed rate of 1.01%.
At the threshold defining high risk, set at > 1.2% risk, the model demonstrated a sensitivity of 74% and a specificity of 82% over up to 10 years of follow-up.
These performance characteristics suggest the model can meaningfully discriminate between high- and lower-risk individuals using data clinicians already collect, without requiring additional testing or referrals. The combination of routinely captured EHR variables with socioeconomic context data allows risk estimation to account for structural factors influencing diabetes development beyond traditional clinical metrics.
Investigators described plans to test the model prospectively in a clinical setting to assess whether its deployment increases engagement in type 2 diabetes prevention programs and reduces incidence.
References
American Diabetes Association. Machine learning model accurately predicts long-term risk of type 2 diabetes. Published June 5, 2026. Accessed June 5, 2026. https://www.diabetes.org
Rodriguez LA. Machine-learning modeling for T2DM prediction in over 3 million adults [abstract 2321-P]. Presented at the American Diabetes Association (ADA) 2026 Scientific Sessions in New Orleans, LA, June 5-8, 2026.