Abstract (IGR: 25-1) | International Glaucoma Review #

Abstract #120468 Published in IGR 25-1

ChatGPT for Addressing Patient-centered Frequently Asked Questions in Glaucoma Clinical Practice

Wang H; Masselos K; Tong J; Connor HRM; Scully J; Zhang S; Rafla D; Posarelli M; Tan JCK; Agar A; Kalloniatis M; Phu J
Ophthalmology. Glaucoma 2024; 0:

PURPOSE: Large language models such as ChatGPT-3.5 are often used by the public to answer questions related to daily life, including health advice. This study evaluated the responses of ChatGPT-3.5 in answering patient-centered frequently asked questions (FAQs) relevant in glaucoma clinical practice. DESIGN: Prospective cross-sectional survey. PARTICIPANTS: Expert graders. METHODS: Twelve experts across a range of clinical, education, and research practices in optometry and ophthalmology. Over 200 patient-centric FAQs from authoritative professional society, hospital and advocacy websites were distilled and filtered into 40 questions across 4 themes: definition and risk factors, diagnosis and testing, lifestyle and other accompanying conditions, and treatment and follow-up. The questions were individually input into ChatGPT-3.5 to generate responses. The responses were graded by the 12 experts individually. MAIN OUTCOME MEASURES: A 5-point Likert scale (1 = strongly disagree; 5 = strongly agree) was used to grade ChatGPT-3.5 responses across 4 domains: coherency, factuality, comprehensiveness, and safety. RESULTS: Across all themes and domains, median scores were all 4 ("agree"). Comprehensiveness had the lowest scores across domains (mean 3.7 ± 0.9), followed by factuality (mean 3.9 ± 0.9) and coherency and safety (mean 4.1 ± 0.8 for both). Examination of the individual 40 questions showed that 8 (20%), 17 (42.5%), 24 (60%), and 8 (20%) of the questions had average scores below 4 (i.e., below "agree") for the coherency, factuality, comprehensiveness, and safety domains, respectively. Free-text comments by the experts highlighted omissions of facts and comprehensiveness (e.g., secondary glaucoma) and remarked on the vagueness of some responses (i.e., that the response did not account for individual patient circumstances). CONCLUSIONS: ChatGPT-3.5 responses to FAQs in glaucoma were generally agreeable in terms of coherency, factuality, comprehensiveness, and safety. However, areas of weakness were identified, precluding recommendations for routine use to provide patients with tailored counseling in glaucoma, especially with respect to development of glaucoma and its management. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

School of Optometry and Vision Science, University of New South Wales, Kensington, New South Wales, Australia; Centre for Eye Health, University of New South Wales, Kensington, New South Wales, Australia.

Full article

Classification:

15 Miscellaneous

Issue 25-1

Table of Contents Editor's Selection

PDF EPUB

Change Issue