- Team Name: Seq2Cure
- Team Members: Arnav Samal (NIT Rourkela) & Ranya (IGDTUW Delhi)
- Challenge: Capsule Vision 2024 - Multi-Class Abnormality Classification for Video Capsule Endoscopy
- Results: Rank: 5th, Balanced Accuracy: 0.8634, Mean AUC-ROC: 0.9908
The Capsule Vision 2024 Challenge aims to develop AI-based models for multi-class abnormality classification in video capsule endoscopy (VCE) frames. By automating this process, the goal is to reduce the inspection time for gastroenterologists without compromising diagnostic precision. The dataset includes 10 class labels, and teams are evaluated on metrics such as accuracy and AUC.
For more details, visit the challenge website.
capsule-vision-2024/
├── /data
│ ├── training_data_collection.py # Script for collecting training data
│ └── test_data_collection.txt # Information on test data collection
├── /logs
│ └── /Model1, /Model2, ... # Subdirectories named after different models used (store logs)
├── /misc
│ ├── /baseline # Contains baseline codebase
│ ├── /info # Additional information about the project
│ ├── rough.ipynb # A rough Jupyter notebook with preliminary work
│ └── test.py # Test script for experimentation
├── /models # Empty, to store downloaded models (.pth files)
├── /reports
│ └── /Model1, /Model2, ... # Subdirectories for reports, corresponding to models in the logs
├── /src
│ ├── /notebooks # Jupyter notebooks in PDF format
│ ├── /sample # Sample code provided by the organizers
│ ├── data_setup.py # Script for dataset setup
│ └── ... # Other source code files for the project
├── /submission
│ ├── submission_report.pdf # Submission Report for the Challenge
│ ├── metrics_report.json # Report detailing the model's evaluation metrics
│ ├── validation_excel.xlsx # Validation results in Excel format
│ └── Seq2Cure.xlsx # Final submission in Excel format
├── .gitignore # Model Files (*.pth) to be ignored
├── LICENSE # License file for the repository
├── README.md # Project overview (this file)
└── requirements.txt # List of dependencies
- /data: Contains scripts for collecting and processing training and test datasets.
- /logs: Stores logs from model training, organized by the name of the model.
- /misc: Includes additional resources such as baseline codebase, papers, and exploratory notebooks.
- /models: An empty folder where pre-trained models should be downloaded (to be linked externally).
- /reports: Contains reports for each model, tracking performance and experiments.
- /src: Includes all source code for the project, including data setup scripts and sample code provided by the challenge organizers.
- /submission: The final output files for submission, including performance metrics and the Excel files required by the challenge organizers.
The dataset comprises of over 50,000 frames from three public sources and one private dataset, labeled across 10 abnormality classes- Angioectasia, Bleeding, Erosion, Erythema, Foreign Body, Lymphangiectasia, Normal, Polyp, Worms.
The project implements a multi-model ensemble approach for video capsule endoscopy frame classification, utilizing three main components:
For more specific details check the submission report
The ensemble incorporates established CNN and transformer architectures:
Traditional CNN Models:
- EfficientNet
- ResNet
- MobileNetV3
- RegNet
- DenseNet
- InceptionV4
- ResNeXt
- WideResNet
- MNASNet
- SEResNet50
- ConvNeXt
Transformer-based Models:
- Vision Transformer (ViT)
- Swin Transformer
- DeiT (Data-efficient Image Transformers)
- BEiT (Bidirectional Encoder Representation from Image Transformers)
- CaiT (Class-Attention in Image Transformers)
- TwinsSVT (Spatially Separable Vision Transformer)
- EfficientFormer
Training Parameters:
- Epochs: 20
- Batch Size: 32
- Computing Cores: 4
- Optimizer: AdamW (Learning Rate: 1e-4, Weight Decay: 0.05)
- Model Selection Criterion: (Balanced Accuracy + Mean AUC Score) / 2
- Early Stopping Patience: 5 epochs
- Convergence Tolerance: 1e-4
Class Imbalance Mitigation:
- Weighted Random Sampling based on class frequencies
- Focal Loss implementation
Data Augmentation Pipeline:
-
Spatial Transformations:
- Resize to (224, 224)
- Random Horizontal Flip (p=0.5)
- Random Vertical Flip (p=0.3)
- Random Rotation (±15°)
- Random Affine (translation: 10%, scale: 0.9-1.1)
- Random Perspective (distortion scale: 0.2, p=0.5)
-
Intensity Transformations:
- Color Jitter (brightness: 0.2, contrast: 0.2, saturation: 0.2, hue: 0.1)
- Normalization (mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
- Random Erasing (p=0.2, scale: 0.02-0.33, ratio: 0.3-3.3)
- Gaussian Blur (kernel size: 3, p=0.3)
Ensemble Strategy:
- Individual model predictions are generated for each input frame
- Softmax probabilities from each model are averaged
- Final classification is determined by the highest average probability across classes
The implementation uses transfer learning with complete model fine-tuning, adapting pre-trained weights to the specific requirements of capsule endoscopy frame classification.
The ensemble model's performance was evaluated on the validation dataset, achieving the following key metrics:
- Balanced Accuracy: 0.8634
- Mean AUC-ROC: 0.9908
Performance across different abnormalities:
-
Highest performing classes:
- Worms (F1: 0.9927, AUC: 0.9999)
- Normal (F1: 0.9828, AUC: 0.9960)
- Ulcer (F1: 0.9570, AUC: 0.9979)
-
Areas for improvement:
- Erythema (F1: 0.6643, AUC: 0.9892)
- Polyp (F1: 0.7539, AUC: 0.9858)
Overall model performance:
- Macro-averaged Precision: 0.8666
- Macro-averaged Recall: 0.8634
- Macro-averaged F1-score: 0.8645
- Mean Specificity: 0.9900
For detailed performance metrics including class-wise precision, recall, specificity, and AUC-ROC scores, refer to metrics_report.json.
To set up the environment and install the necessary dependencies, follow these steps:
-
Clone the repository:
git clone https://github.com/your-username/capsule-vision-2024.git cd capsule-vision-2024
-
Create an environment:
pip install virtualenv virtualenv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Download Data
- Run training_data_collection.py
- Check test_data_collection on how to download testing data
- For both training and inference phases, we utilized the same dataset, which has been made publicly available on Kaggle through the following links: training dataset and inference dataset.
-
Download Models
- Download from this link
- Use PyTorch, Version 1, Updated Variation
This repository is licensed under the MIT License, allowing for its use in research and educational purposes.