Leveraging Video Vision Transformer for Alzheimer's Disease Diagnosis from 3D Brain MRI

Document detail

ID

oai:arXiv.org:2501.15733

Topic

Electrical Engineering and Systems... Computer Science - Artificial Inte... Computer Science - Computer Vision...

Author

Akan, Taymaz Alp, Sait Bhuiyan, Md. Shenuarin Disbrow, Elizabeth A. Conrad, Steven A. Vanchiere, John A. Kevil, Christopher G. Bhuiyan, Mohammad A. N.

Year

2025

listing date

1/29/2025

Keywords

3d deep computer mri learning accuracy vit-bilstm science disease alzheimer cnn-bilstm models ad video diagnosis

Metrics

Abstract

Alzheimer's disease (AD) is a neurodegenerative disorder affecting millions worldwide, necessitating early and accurate diagnosis for optimal patient management.

In recent years, advancements in deep learning have shown remarkable potential in medical image analysis.

Methods In this study, we present "ViTranZheimer," an AD diagnosis approach which leverages video vision transformers to analyze 3D brain MRI data.

By treating the 3D MRI volumes as videos, we exploit the temporal dependencies between slices to capture intricate structural relationships.

The video vision transformer's self-attention mechanisms enable the model to learn long-range dependencies and identify subtle patterns that may indicate AD progression.

Our proposed deep learning framework seeks to enhance the accuracy and sensitivity of AD diagnosis, empowering clinicians with a tool for early detection and intervention.

We validate the performance of the video vision transformer using the ADNI dataset and conduct comparative analyses with other relevant models.

Results The proposed ViTranZheimer model is compared with two hybrid models, CNN-BiLSTM and ViT-BiLSTM.

CNN-BiLSTM is the combination of a convolutional neural network (CNN) and a bidirectional long-short-term memory network (BiLSTM), while ViT-BiLSTM is the combination of a vision transformer (ViT) with BiLSTM.

The accuracy levels achieved in the ViTranZheimer, CNN-BiLSTM, and ViT-BiLSTM models are 98.6%, 96.479%, and 97.465%, respectively.

ViTranZheimer demonstrated the highest accuracy at 98.6%, outperforming other models in this evaluation metric, indicating its superior performance in this specific evaluation metric.

Conclusion This research advances the understanding of applying deep learning techniques in neuroimaging and Alzheimer's disease research, paving the way for earlier and less invasive clinical diagnosis.

Akan, Taymaz,Alp, Sait,Bhuiyan, Md. Shenuarin,Disbrow, Elizabeth A.,Conrad, Steven A.,Vanchiere, John A.,Kevil, Christopher G.,Bhuiyan, Mohammad A. N., 2025, Leveraging Video Vision Transformer for Alzheimer's Disease Diagnosis from 3D Brain MRI