News|Articles|June 5, 2026

Machine Learning Model Predicts Type 2 Diabetes Risk Up To 10 Years Out

Fact checked by: Ryan Livingston
Listen
0:00 / 0:00

Key Takeaways

  • A retrospective Kaiser Permanente Northern California cohort (2012–2024) included 3,365,464 adults (median age 39; 55% female) with type 2 diabetes incidence 10.7/1,000 person-years.
  • A hazard-based super learning ensemble produced individualized 1-, 3-, and 10-year risk estimates from demographics, vitals/labs, comorbidities, medications, and neighborhood-level social determinants.
SHOW MORE

An EHR-based machine learning model predicted 10-year type 2 diabetes risk with near-ideal calibration in over 3 million adult patients.

A novel electronic health record (EHR)-based machine learning model accurately identified adults at high risk of developing type 2 diabetes up to a decade in advance, according to data presented at the 2026 Scientific Sessions of the American Diabetes Association (ADA) in New Orleans, Louisiana.

According to the US Centers for Disease Control and Prevention, more than 40 million people in the US have diabetes, and about 90% to 95% of them have type 2 diabetes. People with overweight, who are physically active less than 3 times a week, are 45 years of age or older, and have prediabetes are at an increased risk of developing type 2 diabetes.

Key Findings

  • EHR-based machine learning model predicted type 2 diabetes risk at 1, 3, and 10 years in 3,365,464 adults at Kaiser Permanente Northern California
  • Training AUC 0.886 (95% CI, 0.883–0.888); validation AUC 0.883 (95% CI, 0.88–0.886)
  • At a high-risk threshold of >1.2%, sensitivity was 74% and specificity was 82% over up to 10 years of follow-up
  • Authors plan prospective clinical testing to evaluate impact on prevention program enrollment and diabetes incidence

Current diabetes prevention programs cannot serve the more than 60% of US adults carrying type 2 diabetes risk factors. The gradual, often asymptomatic progression of the disease limits the ability of health systems to prioritize prevention resources toward the patients most likely to benefit. As such, the present model represents a potential step toward more precise, scalable risk stratification using data already available in the clinical record.

"These findings represent a potential advancement over existing approaches for identifying individuals at risk of developing type 2 diabetes by enabling earlier, more precise detection and supporting a more targeted, proactive approach to prevention," Luis Rodriguez, PhD, MPH, RD, lead author of the study, said in a statement.¹ "Our model has the potential to create an opportunity for clinicians and health systems to focus prevention efforts on the high-risk individuals often missed by traditional screening who have the most to gain from prevention and treatment."

The retrospective cohort study analyzed 3,365,464 adults aged 18 to 70 receiving care at Kaiser Permanente Northern California between 2012 and 2024. Median patient age was 39 years, and 55% were female.

Researchers applied a hazard-based super learning approach combining multiple survival-analysis models to generate individualized risk estimates at 1, 3, and 10 years. Input variables included routine clinical and demographic data collected at medical visits including age, weight, blood glucose levels, medical history, and medications, as well as publicly available social determinants data including access to healthy food and walkable areas.

During a median follow-up of 5.4 years, the observed type 2 diabetes incidence was 10.7 per 1,000 person-years.

The training model achieved an area under the curve (AUC) of 0.886 (95% CI, 0.883 to 0.888), with the validation model scoring 0.883 (95% CI, 0.88 to 0.886). 1-year calibration was near-ideal, with a mean predicted risk of 1.03% against an observed rate of 1.01%.

At the threshold defining high risk, set at > 1.2% risk, the model demonstrated a sensitivity of 74% and a specificity of 82% over up to 10 years of follow-up.

These performance characteristics suggest the model can meaningfully discriminate between high- and lower-risk individuals using data clinicians already collect, without requiring additional testing or referrals. The combination of routinely captured EHR variables with socioeconomic context data allows risk estimation to account for structural factors influencing diabetes development beyond traditional clinical metrics.

Investigators described plans to test the model prospectively in a clinical setting to assess whether its deployment increases engagement in type 2 diabetes prevention programs and reduces incidence.

References
  1. American Diabetes Association. Machine learning model accurately predicts long-term risk of type 2 diabetes. Published June 5, 2026. Accessed June 5, 2026. https://www.diabetes.org
  2. Rodriguez LA. Machine-learning modeling for T2DM prediction in over 3 million adults [abstract 2321-P]. Presented at the American Diabetes Association (ADA) 2026 Scientific Sessions in New Orleans, LA, June 5-8, 2026.
  3. US Centers for Disease Control and Prevention. Type 2 Diabetes. May 15, 2024. Accessed June 5, 2026. https://www.cdc.gov/diabetes/about/about-type-2-diabetes.html

Latest CME