Retina specialists and an AI screening tool trained to measure geographic atrophy agreed in a majority of cases, potentially expediting clinical trial enrollment and increasing consistency in measuring the disease.
An AI screening tool that was trained to detect geography atrophy (GA) effectively screened participants for clinical trial enrollment, with agreement with retina specialists in nearly three-fourths of cases, according to findings presented in a poster at the Association for Research in Vision and Ophthalmology (ARVO) 2024 Meeting.1
Of the 33 prospectively screened patients, roughly 10 were excluded by the AI as being out of range (30.3%). This compared with 11 who were excluded by graders from the Wisconsin Reading Center who oversaw the research. Overall, the agreement between the AI screening tool and retina specialists was 73% and the agreement was 67% between the AI reports and the Wisconsin Reading Center researchers. "Most disagreements were due to foveal involvement or missed small lesions," said senior researcher Amitha Domalpally, MD, PhD.
"The study highlights the importance of validation and implementation studies to bridge the gap between AI model creation and real-world application," Domalpally, research director of the Wisconsin Reading Center, told HCPLive over email. "There is a critical need for accurate measurement of GA, a key factor for screening and monitoring in clinical trials. Currently there are no clinical tools to measure GA."
For the prospectively run study, clinical trial eligibility was defined as a uni or multifocal GA area of 1.25 to 23 mm2, along with the absence of neovascular age-related macular degeneration. After these patients were identified by clinical examination, their fundus autofluorescence (FAF) images were captured and uploaded to the AI platform for further validation. This was followed by a human review first by the retina specialists then the researchers in Wisconsin for validation.
"We performed this study to implement an AI model in a clinical trial and test its real-world performance," said Domalpally. In the ARVO poster, researchers concluded that "Al algorithms enhance accuracy of patient selection in clinical trials, expediting enrollment through real-time assessment and reducing screen failure rates."
The AI model was initially trained using FAF images from the Age-Related Eye Disease Studies 2 (AREDS2). These data were annotated by researchers at Wisconsin Reading Center to train the AI to find GA specifically. The model was trained using weak-labeled images, which consisted of measurements only and no GA indicators, and a strong-labeled image, which included GA segmentation masks. Initial findings for the AI were published in Ophthalmology Science in early 2024.2
In this initial AREDS2 dataset, 601 images were analyzed by humans and by AI for both weak- and strong-labeled images. The mean area of GA for each was 6.65 mm2, 6.83 mm2, and 6.58 mm2, respectively. The dice coefficient in this example for cross validation was 0.885, meaning there was 88.5% similarity in the areas between the AI and the researchers.
In addition to the AREDS2 data, 156 images were also analyzed from GSK testing data from a clinical trial that was ran between 2011 and 2016 (NCT01342926). This set was annotated by the Reading Center and in this samples the human GA area was 9.79 mm2, the weak-labeled was 8.82 mm2, and the strong-labeled was 9.55 mm2. The mean difference between the human and the strong labeled was -0.24 mm2. In this case, there was a 91.8% agreement (dice coefficient, 0.918).
In a similar retrospective fashion, which was separate from the research published in Ophthalmology Science, another data set was examined that included 48 images from the Moorfields Eye Hospital. These images were reviewed and annotated by researchers at this institution and found a 0.478 mm2 mean difference in GA area between the researcher's measurement and that given by the AI (dice coefficient, 0.895).
"The model performed consistently across all datasets–clinical, clinical trial, retrospective, and prospective–showcasing the generalizability and robustness of the model," said Domalpally. "We had 2 important goals: one, to assess the AI tool’s effectiveness in measuring GA in 4 diverse datasets; and 2, to analyze clinicians use of AI-generated reports on GA area to decide clinical trial eligibility."
References