Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination

Giannos, Panagiotis

Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination

Document detail

ID

oai:pubmedcentral.nih.gov:1027...

Topic

Short Report

Author

Giannos, Panagiotis

Langue

Editor

BMJ Publishing Group

Year

2023

listing date

9/5/2023

Keywords

education models examination ai medical specialty questions

Metrics

Abstract

BACKGROUND: Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests.

However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience.

METHODS: We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool—Specialty Certificate Examination (SCE) Neurology Web Questions bank.

The dataset primarily focused on neurology (80%).

The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations.

The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared.

RESULTS: ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination.

ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics.

CONCLUSIONS: The advancements in ChatGPT-4’s performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice.

However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models’ relevance and reliability in the rapidly evolving field of medicine.

Giannos, Panagiotis, 2023, Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination, BMJ Publishing Group

Document

Open Open

Source

Articles recommended by ES/IODE AI

Computer Science

Partially explicit splitting scheme with explicit-implicit-null method for nonlinear multiscale flow problems

nonlinear method multiscale

AJNR: American Journal of Ne...

A Novel MR Imaging Sequence of 3D-ZOOMit Real Inversion-Recovery Imaging Improves Endolymphatic Hydrops Detection in Patients with Ménière Disease

ménière disease p < detection imaging sequences 3d-zoomit 3d endolymphatic real tse reconstruction ir inversion-recovery hydrops ratio

Medicine & Public Health

Successful omental flap coverage repair of a rectovaginal fistula after low anterior resection: a case report

rectovaginal fistula rectal cancer low anterior resection omental flap muscle flap rectal cancer pod initial repair rvf flap omental lar coverage

Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination

Document detail

ID

Topic

Author

Langue

Editor

Category

Year

listing date

Keywords

Metrics

Abstract

Document

Share

Source

Articles recommended by ES/IODE AI

Partially explicit splitting scheme with explicit-implicit-null method for nonlinear multiscale flow problems

A Novel MR Imaging Sequence of 3D-ZOOMit Real Inversion-Recovery Imaging Improves Endolymphatic Hydrops Detection in Patients with Ménière Disease

Successful omental flap coverage repair of a rectovaginal fistula after low anterior resection: a case report