Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population

Maiter, Ahmed; Hocking, Katherine; Matthews, Suzanne; Taylor, Jonathan; Sharkey, Michael; Metherall, Peter; Alabed, Samer; Dwivedi, Krit; Shahin, Yousef; Anderson, Elizabeth; Holt, Sarah; Rowbotham, Charlotte; Kamil, Mohamed A; Hoggard, Nigel; Balasubramanian, Saba P; Swift, Andrew; Johns, Christopher S

Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population

Dokumentdetails

ID

oai:pubmedcentral.nih.gov:1063...

Thema

Radiology and Imaging

Autor

Maiter, Ahmed Hocking, Katherine Matthews, Suzanne Taylor, Jonathan Sharkey, Michael Metherall, Peter Alabed, Samer Dwivedi, Krit Shahin, Yousef Anderson, Elizabeth Holt, Sarah Rowbotham, Charlotte Kamil, Mohamed A Hoggard, Nigel Balasubramanian, Saba P Swift, Andrew Johns, Christopher S

Langue

Editor

BMJ Publishing Group

Kategorie

BMJ Open

Jahr

2023

Auflistungsdatum

14.12.2023

Schlüsselwörter

8% ppv lung 9% demonstrated compared 2% chest 6% radiographs cancer

Metrisch

Zusammenfassung

OBJECTIVES: Early identification of lung cancer on chest radiographs improves patient outcomes.

Artificial intelligence (AI) tools may increase diagnostic accuracy and streamline this pathway.

This study evaluated the performance of commercially available AI-based software trained to identify cancerous lung nodules on chest radiographs.

DESIGN: This retrospective study included primary care chest radiographs acquired in a UK centre.

The software evaluated each radiograph independently and outputs were compared with two reference standards: (1) the radiologist report and (2) the diagnosis of cancer by multidisciplinary team decision.

Failure analysis was performed by interrogating the software marker locations on radiographs.

PARTICIPANTS: 5722 consecutive chest radiographs were included from 5592 patients (median age 59 years, 53.8% women, 1.6% prevalence of cancer).

RESULTS: Compared with radiologist reports for nodule detection, the software demonstrated sensitivity 54.5% (95% CI 44.2% to 64.4%), specificity 83.2% (82.2% to 84.1%), positive predictive value (PPV) 5.5% (4.6% to 6.6%) and negative predictive value (NPV) 99.0% (98.8% to 99.2%).

Compared with cancer diagnosis, the software demonstrated sensitivity 60.9% (50.1% to 70.9%), specificity 83.3% (82.3% to 84.2%), PPV 5.6% (4.8% to 6.6%) and NPV 99.2% (99.0% to 99.4%).

Normal or variant anatomy was misidentified as an abnormality in 69.9% of the 943 false positive cases.

CONCLUSIONS: The software demonstrated considerable underperformance in this real-world patient cohort.

Failure analysis suggested a lack of generalisability in the training and testing datasets as a potential factor.

The low PPV carries the risk of over-investigation and limits the translation of the software to clinical practice.

Our findings highlight the importance of training and testing software in representative datasets, with broader implications for the implementation of AI tools in imaging.

Maiter, Ahmed,Hocking, Katherine,Matthews, Suzanne,Taylor, Jonathan,Sharkey, Michael,Metherall, Peter,Alabed, Samer,Dwivedi, Krit,Shahin, Yousef,Anderson, Elizabeth,Holt, Sarah,Rowbotham, Charlotte,Kamil, Mohamed A,Hoggard, Nigel,Balasubramanian, Saba P,Swift, Andrew,Johns, Christopher S, 2023, Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population, BMJ Publishing Group