Deep Learning Models May Standardize Primary Open-Angle Glaucoma Determination

High diagnostic accuracy of automated deep learning modules using photographs from the Ocular Hypertension Treatment Study (OHTS) suggest the potential to standardize and automate primary open-angle glaucoma (POAG) detection, according to a recent diagnostic study.

Moreover, the higher false positive rate in early photographs of eyes that later developed POAG compared to eyes that did not develop POAG suggest that deep learning may have detected POAG earlier than OHTS Endpoint Committee or reading centers.

“We believe integration of [deep learning] analyses of photographic images and other test results in clinical trials could reduce the cost and improve the consistency and accuracy of end point assessments, either by decreasing or replacing the personnel required to complete the task,” wrote study author Linda M. Zangwill, PhD, Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego.

The OHTS recruited 1636 participants wth ocular hypertension with elevated intraocular pressure from 22 sites with a mean follow-up of 10.7 years. At study entry, all participants were required to have normal-appearing optic nerve heads (ONH) and visual fields within normal limits.

A total of 66,715 photographs from 3272 eyes were used to train and test a ResNet-50 model to detect POAG determination based on optic disc (287 eyes, 3502 photographs) and/or visual field (198 eyes, 1300 visual fields) changes.

The performance in distinguishing between healthy eyes and eyes with glaucoma was evaluated using sensitivity, specificity, precision, and area under the receiver operating characteristic curve (AUROC). In order to help evaluate clinical utility, sensitivity at 4 fixed levels of specificity (80%, 85%, 90%, and 95%) was evaluated.

Additionally, the evaluation of false-positive rates was used to determine whether the DL model detected POAG before committee determination.

The training set included a total of 1147 participants (661 [57.6%] female; mean age, 57.2 years; 95% CI, 56.6 - 57.8), with 167 in the validation set (97 [58.1%] female; mean age, 57.1 years; 95% CI, 55.6 - 58.7) and 322 in the test set (173 [53.7%] female; mean age, 57.2 years; 95% CI, 56.1 - 58.2).

The best diagnostic accuracy of the deep learning model was achieved based on optic disc changes (AUROC, 0.91; 95% CI, 0.88 - 0.94), subsequently followed by either optic disc or VF changes (AUROC, 0.88; 95% CI, 0.82 - 0.92) and VF only change (AUROC, 0.86; 95% CI, 0.76 - 0.93).

Further, data show false-positive rates at 90% specificity were higher in photographs of eyes with ocular hypertension that later developed POAG by disc or visual field (27.5% [56 of 204]) compared to eyes that did not develop POAG (11.4% [50 of 440]) during follow-up.

Additionally, the diagnostic accuracy of the deep learning model developed on the optic disc end point applied to 3 independent datasets was lower, with AUROCs from 0.74 (95% CI, 0.70 - 0.77) to 0.79 (95% CI, 0.78 - 0.81).

“Given the challenging and subjective nature of POAG determination, these results suggest a role for artificial intelligence in improving the accuracy and consistency of the process at lower cost,” Zangwill concluded.

The study, “Detecting Glaucoma in the Ocular Hypertension Study Using Deep Learning,” was published in JAMA Ophthalmology.