GitHub - akshaykalucha/FeaturePatch-VisionTransformation

Pattern Recognition & Computer Vision

Fine-tuning code and pre-trained models
Explore the official paper »

Table of Contents

About The Code
- Built with
Working & Test
- Prerequisites
Usage
Contributing
Acknowledgements & Resources

About The Project

Statistical pattern recognition, nowadays often known under the term "machine learning",
is the key element of modern computer science. Its goal is to find, learn, and recognize patterns in complex data,
for example in images, speech, biological pathways, the internet.

This repo is a gist of implementation of the Vision Transformation which was introduced in the paper: An Image is worth 16x16 words
This repository is uses Py-Torch implementation availabale here
The Py-Torch repository has pre-trained weights

The code just a rewrite & straight implementation of the VisionTransformer class, with minor modifications
and simplifications the class function is easier to run & modify for future work to patch and embed images for classification.

A list of commonly used resources that I find helpful are listed in the acknowledgements.

Built With

The raw implementation of code is built using python3.7.9 & pip20.0

Python
Pip

Getting Started

A quick overview of the architecture

The Vision Transformer is an image classifier which takes in an image and outputs the class & sub-class prediction, HOWEVER,
it does that without any convolutional layer, INSTEAD it uses the attention layers which is used already in NLTK, that is-an Attention Mechanism is also an attempt to implement the same action of selectively concentrating on a few relevant things,
while ignoring others in deep neural networks, However, in computer vision, convolutional neural networks (CNNs) are still the norm and self-attention just began to slowly creep into the main body of research.

The network is trained in three steps where image is turned in sequence of 1D tokens to use transform architecture:

Fine-tuning of the global features pretrained by ImageNet & flatten the patches into 1D vectors.
Mask inference to obtain the cropped images and perform fine-tuning of the local feature. Hereby, the weights in the global features are fixed.
Concatenating of the global and local feature outputs and fine-tuning of the fusion feature while freezing the weights of the other features.
The position embeding allows the network to determine what part of the image a specific patch came from.

stand-alone self-attention

Prerequisites

Install the dependencies before running the compute.py file

pip
```
$ pip install -r requirements.txt
```

Usage

First, build & download the model using command:

python run_model.py

you can change the attributes & parameters by, the default image is 384x384:

custom_config = {
    "img_size": 384,
    "in_chans": 3,
    "patch_size": 16,
    "embed_dim": 768,
    "depth": 12,
    "n_heads": 12,
    "qkv_bias": True,
    "mlp_ratio": 4,
}

To run the classification function and predict probability output:

python compute.py -image or -i <image destination, usually the base dir>

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/FeaturePatch-VisionTransformation)
Commit your Changes (git commit -m 'Add some updates')
Push to the Branch (git push origin feature/FeaturePatch-VisionTransformation)
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
images		images
MLP.py		MLP.py
README.md		README.md
VI_TS.py		VI_TS.py
classes.txt		classes.txt
compute.py		compute.py
requirements.txt		requirements.txt
resize.py		resize.py
run_model.py		run_model.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pattern Recognition & Computer Vision

About The Project

Built With

Getting Started

A quick overview of the architecture

Prerequisites

Usage

Contributing

Acknowledgements

About

Releases

Packages

Languages

akshaykalucha/FeaturePatch-VisionTransformation

Folders and files

Latest commit

History

Repository files navigation

Pattern Recognition & Computer Vision

About The Project

Built With

Getting Started

A quick overview of the architecture

Prerequisites

Usage

Contributing

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages