Document detail
ID

doi:10.1186/s41747-024-00442-4...

Author
Kilintzis, Vassilis Kalokyri, Varvara Kondylakis, Haridimos Joshi, Smriti Nikiforaki, Katerina Díaz, Oliver Lekadir, Karim Tsiknakis, Manolis Marias, Kostas
Langue
en
Editor

Springer

Category

Medicine & Public Health

Year

2024

listing date

4/10/2024

Keywords
artificial intelligence, breast ne... platform domain data clinical models dataset access public imaging cancer breast ai
Metrics

Abstract

Background Developing trustworthy artificial intelligence (AI) models for clinical applications requires access to clinical and imaging data cohorts.

Reusing of publicly available datasets has the potential to fill this gap.

Specifically in the domain of breast cancer, a large archive of publicly accessible medical images along with the corresponding clinical data is available at The Cancer Imaging Archive (TCIA).

However, existing datasets cannot be directly used as they are heterogeneous and cannot be effectively filtered for selecting specific image types required to develop AI models.

This work focuses on the development of a homogenized dataset in the domain of breast cancer including clinical and imaging data.

Methods Five datasets were acquired from the TCIA and were harmonized.

For the clinical data harmonization, a common data model was developed and a repeatable, documented “extract-transform-load” process was defined and executed for their homogenization.

Further, Digital Imaging and COmmunications in Medicine (DICOM) information was extracted from magnetic resonance imaging (MRI) data and made accessible and searchable.

Results The resulting harmonized dataset includes information about 2,035 subjects with breast cancer.

Further, a platform named RV-Cherry-Picker enables search over both the clinical and diagnostic imaging datasets, providing unified access, facilitating the downloading of all study imaging that correspond to specific series’ characteristics ( e.g. , dynamic contrast-enhanced series), and reducing the burden of acquiring the appropriate set of images for the respective AI model scenario.

Conclusions RV-Cherry-Picker provides access to the largest, publicly available, homogenized, imaging/clinical dataset for breast cancer to develop AI models on top.

Relevance statement We present a solution for creating merged public datasets supporting AI model development, using as an example the breast cancer domain and magnetic resonance imaging images.

Key points • The proposed platform allows unified access to the largest, homogenized public imaging dataset for breast cancer.

• A methodology for the semantically enriched homogenization of public clinical data is presented.

• The platform is able to make a detailed selection of breast MRI data for the development of AI models.

Graphical Abstract

Kilintzis, Vassilis,Kalokyri, Varvara,Kondylakis, Haridimos,Joshi, Smriti,Nikiforaki, Katerina,Díaz, Oliver,Lekadir, Karim,Tsiknakis, Manolis,Marias, Kostas, 2024, Public data homogenization for AI model development in breast cancer, Springer

Document

Open

Share

Source

Articles recommended by ES/IODE AI

Comparison between Dual-Energy CT and Quantitative Susceptibility Mapping in Assessing Brain Iron Deposition in Parkinson Disease
nigra substantia healthy depositions p < 05 nucleus brain susceptibility ct bilateral dual-energy iron quantitative mapping values magnetic globus pallidus
Integration of human papillomavirus associated anal cancer screening into HIV care and treatment program in Pakistan: perceptions of policymakers, managers, and care providers
hpv hiv msm transgender women anal cancer screening integration pakistan system managers pakistan informants anal screening cancer lack healthcare hiv