the-simpsons-characters-recognition-challenge-iii-hk63560892 created by GitHub Classroom
This project takes on the challenge of classifying images from "The Simpsons" by employing an ensemble of state-of-the-art neural network architectures. The ensemble includes EfficientNet B4, Vision Transformer (ViT), Focal Net, and ConvNeXt V2, combining their strengths to improve the overall prediction accuracy.
- Getting Started
- Dataset
- Models
- Ensemble Strategy
- Training
- Inference
- Results
- Usage
- Contributing
- License
- Authors
- Acknowledgments
Before running the project, ensure you have the following installed:
- Python (version 3.6 or higher recommended)
- PyTorch
- Transformers library
- torchvision
- PIL (Pillow)
Clone the repository and install the required dependencies: git clone https://github.com/your-github-username/simpsons-characters-classification.git cd simpsons-characters-classification pip install -r requirements.txt
This project uses the "Machine Learning 2023 NYCU Classification" dataset available on Kaggle, which contains images of characters from "The Simpsons". You need to download and prepare the dataset before training the models.
Download the dataset here: Simpsons Characters Dataset.
After downloading, extract the dataset into a directory that your scripts will access for training and evaluation purposes.
We use an ensemble of the following models to leverage their combined predictive power:
- EfficientNet B4: An efficient and scalable neural network that achieves state-of-the-art accuracy on image classification tasks.
- Vision Transformer (ViT): Applies the transformer self-attention mechanism to image patches for classification.
- Focal Net: Focuses on learning hard-to-classify examples to improve overall model performance.
- ConvNeXt V2: A recent iteration of ConvNeXt, designed to be more efficient and performant.
The ensemble approach takes the output probabilities from each of the four models and combines them to make a final prediction. This combination is designed to capitalize on the diverse strengths of each individual model, providing a more robust and accurate classification.
Each model is fine-tuned on the Simpsons dataset using transfer learning techniques. Data augmentation and preprocessing steps are applied to enrich the training data and improve model generalization.
For inference, the image is processed through each model to obtain their predictions. These predictions are then combined using an ensemble strategy to yield the final class label for the image.
The ensemble model's performance should be documented here, including accuracy, precision, recall, and F1 scores, along with any other relevant metrics or visualizations of performance.
Instructions for setting up the environment, training the models, and performing inference are included. Scripts and command-line arguments are provided for easy execution of the project's main functions.
Contributions to this project are welcome. Please fork the repository, make your changes, and submit a pull request for review.
- Kaggle for providing the Simpsons Characters Dataset.
- Hugging Face for the transformers library and the pre-trained models.