Skip to content

hk63560892/Image-Recognition-Project-with-Transfer-Learning

Repository files navigation

the-simpsons-characters-recognition-challenge-iii-hk63560892 created by GitHub Classroom

The Simpsons Characters Classification Ensemble

Introduction

This project takes on the challenge of classifying images from "The Simpsons" by employing an ensemble of state-of-the-art neural network architectures. The ensemble includes EfficientNet B4, Vision Transformer (ViT), Focal Net, and ConvNeXt V2, combining their strengths to improve the overall prediction accuracy.

Table of Contents

Getting Started

Prerequisites

Before running the project, ensure you have the following installed:

  • Python (version 3.6 or higher recommended)
  • PyTorch
  • Transformers library
  • torchvision
  • PIL (Pillow)

Installation

Clone the repository and install the required dependencies: git clone https://github.com/your-github-username/simpsons-characters-classification.git cd simpsons-characters-classification pip install -r requirements.txt

Dataset

This project uses the "Machine Learning 2023 NYCU Classification" dataset available on Kaggle, which contains images of characters from "The Simpsons". You need to download and prepare the dataset before training the models.

Download the dataset here: Simpsons Characters Dataset.

After downloading, extract the dataset into a directory that your scripts will access for training and evaluation purposes.

Models Used

We use an ensemble of the following models to leverage their combined predictive power:

  • EfficientNet B4: An efficient and scalable neural network that achieves state-of-the-art accuracy on image classification tasks.
  • Vision Transformer (ViT): Applies the transformer self-attention mechanism to image patches for classification.
  • Focal Net: Focuses on learning hard-to-classify examples to improve overall model performance.
  • ConvNeXt V2: A recent iteration of ConvNeXt, designed to be more efficient and performant.

Ensemble Strategy

The ensemble approach takes the output probabilities from each of the four models and combines them to make a final prediction. This combination is designed to capitalize on the diverse strengths of each individual model, providing a more robust and accurate classification.

Training

Each model is fine-tuned on the Simpsons dataset using transfer learning techniques. Data augmentation and preprocessing steps are applied to enrich the training data and improve model generalization.

Inference

For inference, the image is processed through each model to obtain their predictions. These predictions are then combined using an ensemble strategy to yield the final class label for the image.

Results

The ensemble model's performance should be documented here, including accuracy, precision, recall, and F1 scores, along with any other relevant metrics or visualizations of performance.

Usage

Instructions for setting up the environment, training the models, and performing inference are included. Scripts and command-line arguments are provided for easy execution of the project's main functions.

Contributing

Contributions to this project are welcome. Please fork the repository, make your changes, and submit a pull request for review.

Acknowledgments

  • Kaggle for providing the Simpsons Characters Dataset.
  • Hugging Face for the transformers library and the pre-trained models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published