Most answers provided by ChatGPT-4 were consistently appropriate regarding vitreoretinal surgeries, including retinal detachments, macular holes, and epiretinal membranes.
Medical knowledge provided by the artificial intelligence-powered chatbot, ChatGPT-4, was consistently appropriate regarding common vitreoretinal surgeries for retinal detachment, macular hole, and epiretinal membrane, according to a new retrospective analysis.1
However, the investigative team, led by Ajay E. Kuriyan, MD, MS, a member of the Retina Service at Wills Eye Hospital, suggests ChatGPT, and other natural language models, are not a source of factual information in their current form.
“Improving the credibility and readability of responses, especially in specialized fields, such as medicine, is a critical focus of research” Kuriyan and colleagues wrote. “Patients, physicians, and laypersons should be advised of the limitations of these tools for eye- and health-related counseling.”
Artificial intelligence chatbots produce human-like responses to inputted prompts. Recent literature has suggested these large language models may provide comparable advice for an array of patient questions, including in ophthalmology.2 A cross-sectional analysis of human-written and AI-based responses to 200 eye care questions found AI chatbots generated appropriate answers, that did not significantly differ from ophthalmologists in terms of incorrect information and the likelihood of harm.
However, their implementation in clinical practice requires caution, as there are significant concerns with chatbot use in terms of limited, outdated knowledge and the risks of hallucinations. Hallucinators are chatbot outputs that look convincingly correct but are considered factually inaccurate.3
In this analysis, the investigative team evaluated the appropriateness and readability of medical knowledge provided by ChatGPT-4 regarding common vitreoretinal surgeries for retinal detachments, macular holes, and epiretinal membranes.1 No human participants were a part of the retrospective, cross-sectional analysis.
Kuriyan and colleagues generated lists of common questions regarding the definition, prevalence, visual impact, diagnostic methods, surgical and non-surgical treatment options, postoperative information, surgery-related complications, and visual prognosis of retinal detachment, macular hole, and epiretinal membrane. Then, each question was asked 3 times on the online ChatGPT-4 platform. Data for the study were recorded on April 25, 2023.
Two independent retinal specialists graded the appropriateness of these responses. The two main outcome measures were the readability, determined using Readable, an online readability tool, and the appropriateness of the responses.
Analyses showed the responses were consistently appropriate in 84.6% (n = 33 of 39), 92% (n = 23 of 25), and 91.7% (n = 22 of 24) of the questions related to retinal detachment, macular hole, and epiretinal membrane, respectively. Meanwhile, the answers were inappropriate at least once in 5.1% (n = 2 of 39), 8% (n = 2 of 25), and 8.3% (n = 2 of 24) of the respective questions.
Data showed the average Flesch Kincaid Grade Level and Flesch Reading Ease Score were 14.1 ± 2.6 and 32.3 for retinal detachment, 14 ± 1.3 and 34.4 ± 7.7 for macular hole, and 14.8 ± 1.3 and 28.1 ± 7.5 for epiretinal membrane. Based on these scores, Kuriyan and colleagues suggest a greater level of knowledge is needed to understand the answer material presented by the chatbot.
“These scores indicate that the answers are difficult or very difficult to read for the average layperson and college graduation would be required to understand the material,” they wrote.
References