Document detail

Computer Science - Computer Vision... Computer Science - Artificial Inte... Computer Science - Machine Learnin... Electrical Engineering and Systems...
Campanella, Gabriele Kwan, Ricky Fluder, Eugene Zeng, Jennifer Stock, Aryeh Veremis, Brandon Polydorides, Alexandros D. Hedvat, Cyrus Schoenfeld, Adam Vanderbilt, Chad Kovatch, Patricia Cordon-Cardo, Carlos Fuchs, Thomas J.

Computer Science



listing date


foundation cancer images datasets pre-training learning self-supervised performance pathology computer science


Recent breakthroughs in self-supervised learning have enabled the use of large unlabeled datasets to train visual foundation models that can generalize to a variety of downstream tasks.

While this training paradigm is well suited for the medical domain where annotations are scarce, large-scale pre-training in the medical domain, and in particular pathology, has not been extensively studied.

Previous work in self-supervised learning in pathology has leveraged smaller datasets for both pre-training and evaluating downstream performance.

The aim of this project is to train the largest academic foundation model and benchmark the most prominent self-supervised learning algorithms by pre-training and evaluating downstream performance on large clinical pathology datasets.

We collected the largest pathology dataset to date, consisting of over 3 billion images from over 423 thousand microscopy slides.

We compared pre-training of visual transformer models using the masked autoencoder (MAE) and DINO algorithms.

We evaluated performance on six clinically relevant tasks from three anatomic sites and two institutions: breast cancer detection, inflammatory bowel disease detection, breast cancer estrogen receptor prediction, lung adenocarcinoma EGFR mutation prediction, and lung cancer immunotherapy response prediction.

Our results demonstrate that pre-training on pathology data is beneficial for downstream performance compared to pre-training on natural images.

Additionally, the DINO algorithm achieved better generalization performance across all tasks tested.

The presented results signify a phase change in computational pathology research, paving the way into a new era of more performant models based on large-scale, parallel pre-training at the billion-image scale.

Campanella, Gabriele,Kwan, Ricky,Fluder, Eugene,Zeng, Jennifer,Stock, Aryeh,Veremis, Brandon,Polydorides, Alexandros D.,Hedvat, Cyrus,Schoenfeld, Adam,Vanderbilt, Chad,Kovatch, Patricia,Cordon-Cardo, Carlos,Fuchs, Thomas J., 2023, Computational Pathology at Health System Scale -- Self-Supervised Foundation Models from Three Billion Images





Articles recommended by ES/IODE AI

Should we consider Systemic Inflammatory Response Index (SIRI) as a new diagnostic marker for rectal cancer?
inflammation rectal surgery overall survival complication significantly diagnostic value cancer rectal 38 siri