Détail du document
Identifiant

oai:arXiv.org:2402.18060

Sujet
Computer Science - Computation and...
Auteur
Chen, Hanjie Fang, Zhouxiang Singla, Yash Dredze, Mark
Catégorie

Computer Science

Année

2024

Date de référencement

03/07/2024

Mots clés
llms datasets medical questions clinical
Métrique

Résumé

LLMs have demonstrated impressive performance in answering medical questions, such as achieving passing scores on medical licensing examinations.

However, medical board exam or general clinical questions do not capture the complexity of realistic clinical cases.

Moreover, the lack of reference explanations means we cannot easily evaluate the reasoning of model decisions, a crucial component of supporting doctors in making complex medical decisions.

To address these challenges, we construct two new datasets: JAMA Clinical Challenge and Medbullets.

JAMA Clinical Challenge consists of questions based on challenging clinical cases, while Medbullets comprises simulated clinical questions.

Both datasets are structured as multiple-choice question-answering tasks, accompanied by expert-written explanations.

We evaluate seven LLMs on the two datasets using various prompts.

Experiments demonstrate that our datasets are harder than previous benchmarks.

Human and automatic evaluations of model-generated explanations provide insights into the promise and deficiency of LLMs for explainable medical QA.

Chen, Hanjie,Fang, Zhouxiang,Singla, Yash,Dredze, Mark, 2024, Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

Document

Ouvrir

Partager

Source

Articles recommandés par ES/IODE IA

Systematic druggable genome-wide Mendelian randomization identifies therapeutic targets for lung cancer
agphd1 subtypes replication hykk squamous cell gene carcinoma causal targets mendelian randomization cancer analysis