Skip to content

Latest commit

 

History

History
60 lines (60 loc) · 2.73 KB

2024-07-24-nilsson24a.md

File metadata and controls

60 lines (60 loc) · 2.73 KB
title abstract year volume publisher series software layout issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title genre issued pdf extras
Regularizing and Interpreting Vision Transformer by Patch Selection on Echocardiography Data
This work introduces a novel approach to model regularization and explanation in \Glspl{vit}, particularly beneficial for small-scale but high-dimensional data regimes, such as in healthcare. We introduce stochastic embedded feature selection in the context of echocardiography video analysis, specifically focusing on the EchoNet-Dynamic dataset for the prediction of \gls{lvef}. Our proposed method, termed \Glspl{gvit}, augments \Glspl{vvit}, a performant transformer architecture for videos with \Glspl{cae}, a common dataset-level feature selection technique, to enhance \gls{vvit}’s generalization and interpretability. The key contribution lies in the incorporation of stochastic token selection individually for each video frame during training. Such token selection regularizes the training of \gls{vvit}, improves its interpretability, and is achieved by differentiable sampling of categoricals using the Gumbel-Softmax distribution. Our experiments on EchoNet-Dynamic demonstrate a consistent and notable regularization effect. The \gls{gvit} model outperforms both a random selection baseline and standard \gls{vvit}. % using multiple evaluation metrics. The \gls{gvit} is also compared against recent works on EchoNet-Dynamic where it exhibits state-of-the-art performance among end-to-end learned methods. Finally, we explore model explainability by visualizing selected patches, providing insights into how the \gls{gvit} utilizes regions known to be crucial for \gls{lvef} prediction for humans. This proposed approach, therefore, extends beyond regularization, offering enhanced interpretability for \gls{vit}s.
2024
248
PMLR
Proceedings of Machine Learning Research
inproceedings
2640-3498
nilsson24a
0
Regularizing and Interpreting Vision Transformer by Patch Selection on Echocardiography Data
155
168
155-168
155
false
Nilsson, Alfred and Azizpour, Hossein
given family
Alfred
Nilsson
given family
Hossein
Azizpour
2024-07-24
Proceedings of the fifth Conference on Health, Inference, and Learning
inproceedings
date-parts
2024
7
24