Ophthalmologists and large language models (LLMs) may provide comparable advice for an array of patient questions, indicating the potential of artificial intelligence (AI) to improve the quality and efficiency of patient care in ophthalmology.1
Performing a cross-sectional analysis of human-written and AI-based responses to 200 eye care questions, an investigative team from Stanford University found an AI chatbot generated appropriate answers that did not differ significantly from ophthalmologists in terms of incorrect information and the likelihood of harm.
“Ultimately, the undeniable reality is that LLMs have emerged and are accessible to the general public,” wrote the investigative team, led by Sophia Y. Wang, MD, MS, Stanford University. “We intend for this study to catalyze more extensive and nuanced dialogue and joint efforts surrounding the use of LLMs in ophthalmology among various health care stakeholders, including patients, clinicians, researchers, and policymakers.”
The potential of LLMs for a variety of applications in both medicine and society is awash with excitement, but their implementation in clinical practice may require caution. Patients often turn to the Internet for accessible health information and advice, but there are substantial concerns with chatbot use in terms of limited, outdated knowledge and risks of hallucination – chatbot outputs that sound convincingly correct but are factually inaccurate.
A rapidly changing landscape of AI-driven healthcare amidst the potential for unintended consequences makes rigorous examination of the clinical effectiveness, safety, and ethical implications of AI-powered technologies essential. In the current analysis, Wang and colleagues evaluated how ChatGPT, a popular LLM chatbot, can be used to answer patient questions related to eye health and how the answers fared against those of board-certified ophthalmologists.
The Eye Care forum exists as an online forum where users can ask detailed questions and receive answers from practitioners affiliated with the American Academy of Ophthalmology. Within the forum, the first ophthalmologist response to each post was saved and resulted in a data set of 4747 question-answer pairs prior to exclusion. Posts were dated between 2007 and 2016, while data were accessed in January 2023 and the analysis was performed between March - May 2023.
The final data set was comprised of a random subset of 200 question-answer pairs meeting inclusion criteria, with a median length of 101.5 words for human responses and 129.0 words for chatbot responses. A masked panel of 8 board-certified ophthalmologists was randomly presented with either a human-written or AI-generated answer for each question. These reviewers were asked to decide whether an answer was generated by an ophthalmologist or by the ChatGPT chatbot.
Identification of a chatbot versus human answer was measured on a 4-point scale: likely or definitely AI versus likely or definitely human. Investigators also aimed to determine whether the answer contained incorrect information, the likelihood of harm caused by the answer, the severity of harm, and whether the answer was aligned with or in opposition to consensus in the medical community.
Upon analysis, the expert panel was able to distinguish between the chatbot and human answers, frequently rating AI answers as probably or definitely written by AI compared with human answers (prevalence ratio [PR], 1.72; 95% CI, 1.52 - 1.93). However, the expert panel rated a high number of responses as definitely AI-written, including 320 (40.0%) of the human responses.
The mean accuracy for distinguishing between AI and human responses was 61.3%. Experts ranked the chatbot and human answers similarly on whether they contained incorrect or inappropriate material (PR, 0.92; 95% CI, 0.77 - 1.10) or in terms of their likelihood of harm (PR, 0.84; 95% CI, 0.67 - 1.07). Overall, chatbot answers were not rated to be significantly more harmful than human answers (PR, 0.99; 95% CI, 0.80 - 1.22).
Regardless of the acceptance and endorsement of AI-powered tools by healthcare providers, Wang noted patients are ultimately likely to use these chatbots for medical advice. As a result, it is crucial to assess the accuracy and accessibility of these systems from a both a physician and patient standpoint.
“While LLM-based systems are not designed to replace human ophthalmologists, there may be a future in which they augment ophthalmologists’ work and provide support for patient education under appropriate supervision,” investigators wrote.
Bernstein IA, Zhang Y, Govil D, et al. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA Netw Open. 2023;6(8):e2330320. doi:10.1001/jamanetworkopen.2023.30320