Hayama, Hiromasa and Tran, Tu Hao and Kirigaya, Jin and Katayama, Yosuke and Negishi, Tomoko and Ozawa, Koya and Negishi, Kazuaki (2025) Performance Evaluation of Large Language Models With Retrieval-Augmented Generation in Cardiology Specialist Examinations in Japan. Circulation Reports, 7 (8). pp.692-694. ISSN 2434-0790
Full text not available from this repository.Abstract
BACKGROUND: Large language models (LLMs) have shown potential in medical education, but their application to cardiology specialist examinations remains underexplored. We compared the performances of a retrieval-augmented generation LLM (RAG-LLM) 'CardioCanon' against general-purpose LLMs. METHODS AND RESULTS: A total of 96 publicly available text-based open-source multiple-choice questions from the Japanese Cardiology Specialist Examination (1997-2022) were used. CardioCanon showed similar option-level accuracy to ChatGPT-4o and Gemini 2.0 Flash (81.0%, 76.0%, and 77.2%, respectively), but higher case-based accuracy than ChatGPT (57.3% vs. 29.2%, P<0.001). CONCLUSIONS: RAG techniques can enhance AI-assisted examination performance by improving case-level reasoning and decision-making.
| Item Type: | Article |
|---|---|
| Subjects: | R Medicine > R Medicine (General) |
| Depositing User: | Repository Administrator |
| Date Deposited: | 03 Nov 2025 02:43 |
| Last Modified: | 03 Nov 2025 02:43 |
| URI: | http://eprints.victorchang.edu.au/id/eprint/1743 |
Actions (login required)
![]() |
View Item |
