A new study suggests ChatGPT may not provide substantial assistance in preparing for board certification at this time.
The artificial intelligence (AI) chatbot ChatGPT answered approximately half of high-yield questions meant for preparation for ophthalmic board certification correctly, with varying performance across different subspecialties, according to new research.1
Investigators, led by Rajeev H. Muni, MD, MSc, Department of Ophthalmology and Vision Sciences, University of Toronto, suggested that while medical professionals should appreciate the advancements of AI in medicine, ChatGPT did not provide substantial assistance in preparing for board certification at the time of the study.
“Although the role of ChatGPT may increase in medical education and clinical practice over time, it is important to stress the importance of using such AI systems responsibly,” investigators wrote.1
Developed by OpenAI, ChatGPT is a central part of a debate that may have significant societal implications and the AI system has already been involved in various scientific and medical applications. Although training curricula using AI are being developed throughout medicine, Muni and colleagues stressed the importance for medical students to recognize the limitations of ChatGPT, which can output incorrect information.
The cross-sectional study used a consecutive sample of text-based multiple-choice questions for the Ophthalmic Knowledge Assessment Program (OKAP) and Written Qualifying Exam (WQE) from the free trial of the OphthoQuestions practice question bank for board certification preparation. Of 166 available questions, 125 (75%) were found to be text-based and analyzed by ChatGPT. The primary outcome of the analysis was the number of board certification examination practice questions that ChatGPT answered correctly.
Secondary outcomes included the proportion of questions for which the AI provided additional explanations, the mean length of questions and responses provided by the chatbot, performance in answering questions without multiple-choice options, and changes in performance over time. A secondary analysis repeated the study without providing multiple-choice options to ChatCPT, to analyze its performance in conversational contexts.
ChatGPT answered questions from OphthoQuestions from January 9 - 16, 2023 in the primary analysis and on February 17, 2023, in a secondary analysis. The analysis showed ChatGPT correctly answered 58 of 125 questions (46.4%) in January 2023. It performed best in the category of general medicine, answering 11 of 14 questions correctly (79%), and worst in the category of retina and vitreous, responding incorrectly to all questions.
Moreover, the analysis indicated ChatGPT provided additional explanations for 79 of 125 questions (63%). Investigators noted the proportion of questions for which ChatGPT provided explanations was similar between those answered correctly and incorrectly (difference, 5.8%; 95% CI, -11.0% to 22.0%; P = .51). Data showed the mean length of questions was similar between those answered correctly and incorrectly (difference, 21.4 characters; 95% CI -54.3 to 94.3; P = .22), as was the mean length of responses (difference, -80.0 characters; 95% CI, -209.5 to 49.5; P = .22).
Muni and colleagues noted ChatGPT selected the same multiple-choice response as the most common answer provided by ophthalmology trainees on OphthoQuestions 44% of the time. The analysis was repeated in February 2023 and reported ChatGPT provided a correct response to 73 of 125 multiple-choice questions (58%), an improvement from January 2023. In addition, ChatGPT correctly responded to 42 of 78 stand-alone questions (54%), with a similar performance to answering questions with multiple-choice options.
In a linked editorial, Neil M. Bressler, MD, Editor in Chief, JAMA Ophthalmology, indicated that from the perspective of a reader of peer-reviewed medical literature, having confidence in the accuracy of information generated by an AI chatbot may not yet be possible.
“Given the current potential for misinformation from chatbots regarding ophthalmology, authors should be cautioned regarding the use of such material without careful review of the text provided because authors agree to be accountable for the accuracy or integrity of any part of a submission to the medical literature when following authorship guidelines from the International Committee of Medical Journal Editors (ICMJE),” he wrote.2
Bressler noted that while there remain questions regarding whether AI chatbots can be credited authors, authors need to meet 4 criteria for authorship credit per ICMJE guidelines. The criteria range from “substantial contributions to conception or design of the work,” to “drafting of the work or revising it critically for important intellectual content,” and “final approval of the version to be published.”
Although he suggests it can be argued that an AI chatbot can meet these 3 criteria, authorship credit additionally requires an “agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.” As a result, all journals in the JAMA network have revised their Instructions for Authors to indicate that chatbots cannot be authors.
“Per the JAMA Ophthalmology Instructions for Authors, all authors should meet all 4 criteria for authorship, and all who meet the 4 criteria should be identified as authors,” Bressler wrote.2 “Those who do not meet all 4 criteria, such as an AI chatbot that cannot be accountable or have confidence in the integrity of coauthor contributions, should be acknowledged but not listed as an author.”