Artificial Intelligence Chatbot Appears to Improve on Ophthalmic Knowledge Assessment

July 18, 2023

Article

The AI chatbot, ChatGPT-4, responded correctly to 84% of multiple-choice practice questions on the OphthoQuestions trial, compared to 58% in February 2023.

Rajeev H. Muni, MD, MSc

Credit: Unity Health Toronto

An updated version of the artificial intelligence (AI) chatbot, ChatGPT-4, showed correct responses to 84% of multiple-choice practice questions on OphthoQuestions, a common practice resource to prepare for ophthalmology board certification.¹

The study, out of the University of Toronto, expanded previous research from the same investigative team showing the previous version of the chatbot correctly answered 46% of the multiple-choice questions in January 2023 and 58% in February 2023.²

“Performance of the updated version of this chatbot across all question categories on OphthoQuestions appeared to improve compared with the performance of the previous version,” wrote the investigative team, led by Rajeev H. Muni, MD, MSc, from the department of ophthalmology and vision sciences at the University of Toronto. “Results of this study also suggest that in most cases, the updated version of the chatbot generated accurate responses when options were given.”

Evidence has shown AI chatbots produce human-like responses to inputted prompts, being dynamic language models that work to improve existing conversational AI systems. As the older version of ChatGPT correctly answered nearly half of the multiple-choice questions used to prepare for the American Board of Ophthalmology examination, this analysis updated the investigation by assessing the accuracy of the updated chatbot. The team inputted into ChatGPT-4 (March 2023 release: OpenAI) the same practice questions for the Ophthalmic Knowledge Assessment Program (OKAP) and Written Qualifying Exam (WQE) tests from the free OphthoQuestions trial used in the previous study.

Muni and colleagues logged the proportion of trainees in ophthalmology using the OphthoQuestions trial who selected the same response as the ChatGPT chatbot. The primary outcome of the study was the number of multiple-choice questions the chatbot was able to answer correctly. Investigators performed data analysis using Microsoft Excel, with the chatbot generating answers to the board certification examination in March 2023.

The mean length of questions was 306.40 characters and the mean length of chatbot responses was 473.83 characters. The analysis showed a highly positive response: of 125 text-based multiple-choice questions, 105 (84%) were answered correctly by the chatbot. Moreover, the chatbot responded correctly to 100% of the questions in general medicine, retina and vitreous, and uveitis, but was not as strong in clinical optics answering 8 of 13 questions (62%) correctly.

On average, 71% (95% CI, 66 - 75) of ophthalmology trainees selected the same response to the multiple-choice questions as the chatbot. Investigators noted the chatbot provided explanations and additional insight to 123 of 125 questions (98%). When multiple-choice options were removed, the analysis showed the chatbot answered 49 of 78 stand-alone questions (63%) correctly.

The median length of multiple-choice questions that the chatbot answered correctly was 217 characters and answered incorrectly was 246 characters. On the other hand, the median length of correct responses was 428 characters and incorrect responses were 465 characters.

Muni and colleagues suggested limitations of the study, as the chatbot offers preparation material for the board certification examinations but may perform differently in official examinations. They additionally noted the chatbot produces unique responses per user and these could differ in the study were to be repeated.

“The previous study may have helped train the chatbot in this setting,” investigators wrote. “Results of the present study must be interpreted in the context of the study date as the chatbot’s knowledge corpus will likely continue to expand rapidly.”

References

Mihalache A, Huang RS, Popovic MM, Muni RH. Performance of an Upgraded Artificial Intelligence Chatbot for Ophthalmic Knowledge Assessment. JAMA Ophthalmol. Published online July 13, 2023. doi:10.1001/jamaophthalmol.2023.2754
Iapoce C. Chatgpt not sufficient resource in preparing for Ophthalmology Board certification. HCP Live. May 11, 2023. Accessed July 18, 2023. https://www.hcplive.com/view/chatgpt-not-sufficient-resource-ophthalmology-board-certification.