News|Articles|March 1, 2024

Multimodal AI Chatbot Presents Mixed Results in Ophthalmic Imaging Analysis

The newest version of ChatGPT accurately responded to most multiple-choice questions on ophthalmic cases but performed better on non-image–based questions.

According to a new cross-sectional study, the newest version of the artificial intelligence chatbot ChatGPT-4 accurately responded to approximately two-thirds of image-based multiple-choice questions in a publicly available dataset of ophthalmic cases.¹

However, the large language model (LLM) responded correctly more often on questions that did not rely on ophthalmic image interpretation (82%) than the image-based questions (65%). Stratified by specialty, the chatbot realized its best performance in retina cases and worst in neuro-ophthalmology cases.

“As multimodal LLMs become increasingly widespread, it remains imperative to continuously stress their appropriate use in medicine and highlight concerns surrounding confidentiality and bioethics,” wrote the investigative team led by Rajeev H. Muni, MD, MSc, department of ophthalmology, St Michael’s Hospital Unity Health Toronto.

Recent evidence has indicated the potentially transformative nature of AI chatbots in medicine, particularly in ophthalmology, to ease the burden on healthcare professionals, from patient education to remote monitoring of eye diseases.² Much like any new technology, however, there is a need to address regulatory compliance, privacy, and integration of AI into healthcare systems before the die is cast.

Prior investigations from Muni and colleagues found a previous version of ChatGPT-4, limited to text-based prompts, improved its performance at an impressive rate in medical and ophthalmic settings.³ As ophthalmology relies on the interpretation of multimodal imaging to confirm diagnostic accuracy, the team noted this new ability of the chatbot to interpret ophthalmic images could be critical for reaching that next stage.¹

“The new release of the chatbot holds great potential in enhancing the efficiency of ophthalmic image interpretation, which may reduce the workload on clinicians, mitigate variability in interpretations and errors, and ultimately, lead to improved patient outcomes,” they wrote.

The cross-sectional analysis used publicly available data from the OCTCases medical education platform based at the investigators’ center in Canada. Each case is organized into retina, neuro-ophthalmology, uveitis, glaucoma, ocular oncology, and pediatric ophthalmology. All multiple-choice questions across all available ophthalmic cases on the platform were examined for analysis.

Muni and colleagues created a new ChatGPT Plus account to confirm a lack of previous conversation history with the LLM before study initiation. The LLM account was granted multimodal capability by OpenAI, the chatbot’s parent organization, and all relevant cases and imaging were inputted from October 16 to October 2023, 2023. Chatbot accuracy, measured as the proportion of correct responses, for image recognition, was utilized as the analysis’ primary end point.

Overall, the analysis consisted of 136 cases with 448 images on OCTCases. Among these cases, 429 cases were formatted as multiple-choice questions (82%) and made the statistical analysis. Across these cases, 125 were accompanied by optical coherence tomography (OCT) scans (92%) and 82 cases by fundus images (60%).

Upon analysis, Muni and colleagues found ChatGPT-4 answered 299 of the multiple-choice questions correctly across all ophthalmic cases (70%). The LLM’s performance was best on questions related to retina (77%) and worst in the neuro-ophthalmology category (58%) (difference, 18% [95% CI, 7.5–29.4]; P <.001).

It exhibited intermediate performance on questions from other ocular specialties, including the ocular oncology (72%), pediatric ophthalmology (68%), uveitis (67%), and glaucoma (61%) categories.

Across 303 multiple-choice questions requiring image interpretation, ChatGPT-4 answered 196 questions correctly (65%). Among 126 nonimage-based questions, the score was higher, with 103 correct answers (82%). Overall, the chatbot exhibited better performance on non-imaged based questions (difference, 17% [95% CI, 7.8 - 25.1]; P <.001), but particularly in the pediatric ophthalmology category (difference, 47% [95% CI, 8.5 - 69.0]; P = .02).

Muni and colleagues indicated future analyses should focus on the chatbot’s ability to interpret different ophthalmic imaging modalities, to learn when it becomes as accurate as specific machine learning systems in ophthalmology.

“As the chatbot’s accuracy increases with time, it may develop the potential to inform clinical decision-making in ophthalmology via real-time analysis of ophthalmic cases,” Muni and colleagues wrote.

References

_{Mihalache A, Huang RS, Popovic MM, et al. Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images. JAMA Ophthalmol. Published online February 29, 2024. doi:10.1001/jamaophthalmol.2024.0017}
_{Tan TF, Thirunavukarasu AJ, Jin L, Lim J, Poh S, Teo ZL, Ang M, Chan RVP, Ong J, Turner A, Karlström J, Wong TY, Stern J, Ting DS. Artificial intelligence and digital health in global eye health: opportunities and challenges. Lancet Glob Health. 2023 Sep;11(9):e1432-e1443. doi: 10.1016/S2214-109X(23)00323-6. PMID: 37591589.}
_{Iapoce C. Artificial Intelligence Chatbot appears to improve on Ophthalmic Knowledge Assessment. HCP Live. July 18, 2023. Accessed March 1, 2024. https://www.hcplive.com/view/artificial-intelligence-chatbot-appears-improve-ophthalmic-knowledge-assessment.}

Join thousands of clinicians staying current on new therapies, trial data, and expert insights—subscribe to HCPLive today.

Latest CME

Multimedia

Burst CME: Managing Fluid Overload in Patients with Chronic Kidney Disease (Part 1)

Suneel Udani, MD

Multimodal AI Chatbot Presents Mixed Results in Ophthalmic Imaging Analysis

Related Content

FDA Accepts NDA for Encaleret for Autosomal Dominant Hypocalcemia Type 1

Plozasiran Meets Primary Endpoint in SHASTA-3, SHASTA-4 sHTG Trials

Emerging Therapies on the Horizon for CSU

Building Therapeutic Alliance and Addressing Access Barriers in CSU

Q&A: How Is Risankizumab’s FDA Approval Changing the Psoriatic Disease Landscape?

Latest CME

Burst CME: Managing Fluid Overload in Patients with Chronic Kidney Disease (Part 1)

Looking Beneath the Surface: Latest Updates in Identifying and Managing Hypercortisolism in Patients with Type 2 Diabetes

Navigating Safety Data with Janus Kinase (JAK) Inhibitors in Atopic Dermatitis (AD) Management

Rapid Reviews in Retina™: Emerging Updates from Spring 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

Interventional Dry Eye: A Stepwise Treatment & Management Approach

Assessing the Evidence for OX40-OX40L Axis Inhibition for the Treatment of Atopic Dermatitis

Burst CME: Managing Fluid Overload in Patients with Chronic Kidney Disease (Part 2)

Patient, Provider, and Caregiver Connection: Turning a New Leaf in Acute Pain Management – How Recent Advancements Impact the Treatment Paradigm

(CME Track) Collaborating Across the Continuum™: Best Practices in Patient-Centric Team Management of XLRP

Burst CME™: Optimizing Care for Patients with Psoriasis – Incorporating a Buy-and-Bill Model for Biologic Agents into Dermatological Practice

(CME Track) The Evolution of MacTel Management: Integrating Neuroprotective Therapies Into Clinical Practice

Live Expert Illustrations & Commentary™: Visualizing Novel Therapeutic Targets for Patients with Major Depressive Disorder

(CME Track) Rapid Reviews in Retina™: Emerging Updates from Summer 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

(CME Track) A Forward Look at Anti-VEGF Therapies: A Paradigm Shift in Neovascular Retinal Disease Management

Cases and Conversations™: Biologic Matchmaking in Psoriasis – Finding the Right Therapy for the Right Patient

Collaborating Across the Continuum™: The Pediatrician’s Vital Role in Multidisciplinary Management of Pediatric PAH

Collaborating Across the Continuum™: The Pediatrician’s Vital Role in Multidisciplinary Management of Pediatric PAH

Collaborating Across the Continuum™: The Pediatrician’s Vital Role in Multidisciplinary Management of Pediatric PAH

Patient, Provider, and Caregiver Connection™: Addressing Patient Challenges With Holistic Approaches to Vitiligo Management

(CME Track) Community Collaborative Connections™: Optimizing the Collaborative Care of Neovascular Retinal Disease in a New Age of Treatment

(CME Track) SoCal Psych 2025: Overcoming Barriers to Long-Acting Injectable Agents in Schizophrenia

Cases and Conversations™: Mineralocorticoid Receptor Antagonists in Patients With HF—Augmenting Current Guidelines with Emerging Evidence

Hidradenitis Suppurativa: Diving Deeper Into Disease Pathogenesis, Severity Assessment, and Holistic Management Approaches

Burst CME CGM: Continuous Glucose Monitoring Considerations – Maximizing Quality of Life for Patients

Progress in Hyperlipidemia Management to Reduce ASCVD Risk: An Illustrated Update

Navigating Advances in Neovascular Retinal Disease: Translating Evidence to Practice in AMD, DME, and RVO

(CME Track) Beyond the Collarette: Empowering Patients in the Management of Demodex Blepharitis

(COPE Track) Beyond the Collarette: Empowering Patients in the Management of Demodex Blepharitis

IgAN Case Files: Real Conversations, Evolving Evidence

Expert Illustrations & Commentaries™: Visualizing the Role of Dystrophin Dysregulation as a Therapeutic Target in Duchenne Muscular Dystrophy and MD STARnet

Shining a Light on an Ultra-Rare Disease – A Closer Look at Thymidine Kinase 2 Deficiency (TK2d)

Shining a Light on an Ultra-Rare Disease – A Closer Look at Thymidine Kinase 2 Deficiency (TK2d)

Identifying and Treating Generalized Myasthenia Gravis in the Modern Era

Addressing Unmet Needs for Patients With Spinal Muscular Atrophy—Understanding Patient Challenges and Management Approaches

Rewiring Recovery: Evidence-Based Approaches to Managing Chronic Inflammatory Demyelinating Polyneuropathy

Navigating Ocular Toxicities: A Multidisciplinary Roadmap for Managing Adverse Events in Targeted Cancer Therapy

(CME Track) Antibody–Drug Conjugates in Oncology: The Essentials of AE Management for Better Patient Outcomes

Clinical Consultations™: Tailoring Treatment for Cystic Fibrosis (CF) Across Life Stages and Evolving Health Needs

Clinical Consultations™: Tailoring Treatment for Cystic Fibrosis (CF) Across Life Stages and Evolving Health Needs

Clinical Consultations™: Tailoring Treatment for Cystic Fibrosis (CF) Across Life Stages and Evolving Health Needs

SimulatEd™: Partnering for Precision – A Framework for Personalized Care Planning in Acute Lymphoblastic Leukemia

Putting the Patient First in Acute Pain Management: The PA’s Guide to Incorporating Cutting-Edge Science Into Their Treatment Strategies

Putting the Patient First in Acute Pain Management: The PA’s Guide to Incorporating Cutting-Edge Science Into Their Treatment Strategies

Expert Illustrations & Commentaries™: Visualizing the Role of B Cells as Therapeutic Targets for Generalized Myasthenia Gravis

Rapid Reviews in Retina™: Emerging Updates from Fall 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

Targeting the Cortisol Cascade: Diagnosis and Treatment Strategies in Patients with Hypertension

Targeting the Cortisol Cascade: Diagnosis and Treatment Strategies in Patients with Hypertension

Patient, Provider, and Caregiver Connection™: Individualizing Care in C3 Glomerulopathy – Understanding Patient Challenges and the Role of Innovative Treatment

Biomarker Testing in HER2+ GEA: Diagnosis and Treatment Implications

Navigating the Adverse Event Landscape in HER2+ GEA Therapy

Unlocking the Future of Glioma Care: Integrating Recent Advances to Personalize Treatment

Screening for Type 1 Diabetes and Delaying Its Onset—An Innovative View

Clear Skin, Clear Mind: Integrating Mental Health into Psoriasis Care

Clear Skin, Clear Mind: Integrating Mental Health into Psoriasis Care

Expert Illustrations & Commentary: Visualizing the Role of Novel Muscarinic Agents in the Management of Schizophrenia

Bridging Regional Challenges in Retinal Disease Management: Applying Advanced Anti-VEGF Therapy in Community Practice - NYC Metro

Bridging Regional Challenges in Retinal Disease Management: Applying Advanced Anti-VEGF Therapy in Community Practice - California

(CME Track) Tackling Oncologic Emergencies in Patients Treated With High-Dose Methotrexate

From Clue to Care: Rapid Recognition and Coordinated Management of Paraneoplastic LEMS in SCLC

Burst CME™: Optimal Management of Complications of Sickle Cell Disease

Burst CME™: Optimal Management of Complications of Sickle Cell Disease

Burst CME™: Transition from Pediatric to Adult Care in Sickle Cell Disease

Burst CME™: Transition from Pediatric to Adult Care in Sickle Cell Disease

Burst CME™: Disease-Modifying vs. Curative Therapy – Which Way to Go in Sickle Cell Disease?

Burst CME™: Disease-Modifying vs. Curative Therapy – Which Way to Go in Sickle Cell Disease?

Burst CME™ in Gaucher Disease: Patient Evaluation and Management

Optimizing Lipid-Lowering Strategies for ASCVD Risk Reduction: Bridging the Gap in Treatment Intensification

Collaborating Across the Continuum™: Integrating Novel Therapies Into Multidisciplinary Treatment Plans for Generalized Myasthenia Gravis

Burst CME™: Staying Informed and Up-To-Date on the Treatment of Lupus Nephritis

Burst CME™: Staying Informed and Up-To-Date on the Treatment of Lupus Nephritis

SimulatEd™: A Roadmap to Personalized Care Plans and Shared Decision-Making in Low-Grade Serous Ovarian Cancer

Community Collab™: Identifying the Role of Complement Inhibitors in the Management of Generalized Myasthenia Gravis