Performance Evaluation of Large Language Models With Retrieval-Augmented Generation in Cardiology Specialist Examinations in Japan

Hayama, Hiromasa and Tran, Tu Hao and Kirigaya, Jin and Katayama, Yosuke and Negishi, Tomoko and Ozawa, Koya and Negishi, Kazuaki (2025) Performance Evaluation of Large Language Models With Retrieval-Augmented Generation in Cardiology Specialist Examinations in Japan. Circulation Reports, 7 (8). pp.692-694. ISSN 2434-0790

Full text not available from this repository.

Link to published document: https://doi.org/10.1253/circrep.CR-25-0094

Abstract

BACKGROUND: Large language models (LLMs) have shown potential in medical education, but their application to cardiology specialist examinations remains underexplored. We compared the performances of a retrieval-augmented generation LLM (RAG-LLM) 'CardioCanon' against general-purpose LLMs. METHODS AND RESULTS: A total of 96 publicly available text-based open-source multiple-choice questions from the Japanese Cardiology Specialist Examination (1997-2022) were used. CardioCanon showed similar option-level accuracy to ChatGPT-4o and Gemini 2.0 Flash (81.0%, 76.0%, and 77.2%, respectively), but higher case-based accuracy than ChatGPT (57.3% vs. 29.2%, P<0.001). CONCLUSIONS: RAG techniques can enhance AI-assisted examination performance by improving case-level reasoning and decision-making.

Item Type:	Article
Subjects:	R Medicine > R Medicine (General)
Depositing User:	Repository Administrator
Date Deposited:	03 Nov 2025 02:43
Last Modified:	03 Nov 2025 02:43
URI:	http://eprints.victorchang.edu.au/id/eprint/1743

Actions (login required)

View Item