Editors Selection (IGR: 22-4) | International Glaucoma Review #2478

Editors Selection IGR 22-4

Artificial Intelligence: Ophthalmolgy-specific "chatbots" may contribute significantly to better glaucoma detection

Comment by Fei Li & Zefeng Yang on:

119903 Predicting Glaucoma Before Onset Using a Large Language Model Chatbot, Huang X; Raja H; Madadi Y et al. et al., American Journal of Ophthalmology, 2024; 266: 289-299

Find related abstracts

Predicting glaucoma onset is critical for preventing vision loss, and Artificial Intelligence (AI) offers promising solutions for this challenge. Recent advancements in AI, particularly the development of large language models (LLMs) like ChatGPT, represent significant progress in forecasting glaucoma development even before clinical signs manifest. Evaluating the performance of these AI tools, including LLMs, in accurately predicting glaucoma onset based on data like fundus photographs and clinical information is a crucial area of ongoing research.

In this retrospective case-control study by Huang et al., researchers analyzed 1,504 participants (3,008 eyes) from the Ocular Hypertension Treatment Study (OHTS), using longitudinal visual field (VF) and optic disc photo data to define conversion to glaucoma. Tabular clinical parameters, including demographic, clinical, and ocular measurements, were converted into text prompts for input into ChatGPT. Through iterative prompt engineering, ChatGPT-4.0 achieved an accuracy of 75% (AUC = 0.67, sensitivity = 56%, specificity = 78%) in predicting glaucoma conversion, outperforming ChatGPT-3.5 (accuracy = 61%, AUC = 0.62). Sensitivity improved to 61% in cases where both VF loss and optic nerve damage were present. However, logistic regression matched the accuracy of ChatGPT-4.0 (75%) while achieving a higher AUC (0.73).

Future endeavors should focus on developing ophthalmology-specific LLM-based systems to maximize predictive performance, rather than utilizing generic and commercially available models like ChatGPT

The initial exploration of this study reported a moderate sensitivity (56%) of ChatGPT in predicting glaucoma onset, providing valuable insights for future research. Firstly, it suggested that using only 15 clinical text variables from a single timepoint may not provide sufficient information for accurate prediction. Multimodal integration of image data or the combination of longitudinal tracking of IOP, VF, and optic nerve at multiple timepoints may enhance LLMs’ ability to recognize relevant features for glaucoma onset because it better reflects real-world clinical decision-making. Moreover, the predefined structured variable input format, similar to that of a logistic regression model, meant that the predictive task did not leverage ChatGPT’s core capabilities for processing and integrating complex, less structured, or multimodal data. Finally, these experimental results highlight that future endeavors should focus on developing ophthalmology-specific LLM-based systems to maximize predictive performance, rather than utilizing generic and commercially available models like ChatGPT. Bridging these clinical-technological gaps will be essential to transform LLMs from a proof-of-concept tool into a reliable adjunct for ophthalmologists detecting high-risk patients.

Comments

The comment section on the IGR website is restricted to WGA#One members only. Please log-in through your WGA#One account to continue.

Log-in through WGA#One

Issue 22-4

Table of Contents Editor's Selection

PDF EPUB

Change Issue