In this project I am going to detect and recognize the Iranian cars license plate. [ANPR(automatic number plate recognition) application]:
the overall procedure consists of:
- plate detection
- plate tracking[if input is video]
- plate number recognition(OCR)
As every ML project, we should consider data and model
-
Data
(as supervised, we need both data and label):
1.1) for detection:
input: car image label: bounding boxes Note 1: There are many labeling tools (CVAT, LabelImg, VoTT), and new emerging online tool: roboflow[either for annotating a raw dataset or converting the existing annotations to appropriate one for the model] Note 2: Yolo annotation for bounding boxes: (xcenter, ycenter, width, height) Note 3: yolo predict the offset for bounding boxes Note 4: yolov8 is anchor free Note 5: it is an Object Detection Problem
1.2) for recognition:
input: a rectangle image consisting of sequence of numbers and alphabel Note: we can assume the orientation of text line is horizontal,otherwise an alignment is needed as preprocessing label: text(sequence of characters) Note: it is a sequence to sequence modeling problem(but as there is no free dataset for it(to free annotation for characters at plate level), we consider it as a classification problem)
-
Model:
2-1) Detection:
traditional : HOG, contour detection deep learning based: two stages: RCNN, FastRCNN one stage : YOLO series, SSD, retinanet, Hugging face' YOLOS(based on transformer) 1) YOLO v7 2) also I will try the newly released YOLOv8( January 2023) as it outperformes the previous models (also still is in developing stage) Note: my Yolov8 is trained on a different dataset: as in Jupyter Notebook for yolov8 api_key="wx01Phrfycn12jSgeVb8" Note: my Yolo7 is trained on the same dataset as the youtube clip
2-2) Tracking:
traditional: kalman filter , particle filter, deep learning based: SORT deepSORT
2-3) Recognition:
traditional: template based deep learning based: . using already existing packages for OCR(supporting persian/arabic languages) like easyOCR, tesseract, PP-OCR , . writing a model from scratch for this perpose: most developed models in this area are either based on CTC, attention, transformer or the composition of these architectures.
- using google colab (free GPU)
- using local machine's GPU:
1) create a new venv[in windows]: py -m venv yolov7env
2) activate : yolov7env\Scripts\activate
3) clone yolov7 repo : git clone https://github.com/WongKinYiu/yolov7.git
4) change dir : cd yolov7
Note: since I want to train model on GPU i should change the torch installation(to gpu)-> modify the requirements.txt
5) open requirements.txt and delete these two files:
#torch>=1.7.0,!=1.12.0
#torchvision>=0.8.1,!=0.13.0
6) create another requirement_gpu.txt file and add the following lines:
-i https://download.pytorch.org/whl/cu113
torch==1.11.0+cu113
torchvision==0.12.0+cu113
7) install all requirements by:
pip install -r requirements.txt
pip install -r requirements_gpu.txt
pip install requiremet_extra.txt [for tracking and ocr]
8) write python on cmd and check the availability of cuda:
python
import torch
torch.cuda.is_available()
if yes congratulation :) --> quit() python
9) run the command line for training model
-
Note: training should be done on GPU due to large number of parameters to be tune.
-
YOLO originally trained on COCO dataset consists of 80 classes[in case of one class object detection the mAP should be increased]
you can train on your custom dataset based on the instruction in their GitHub page
- clone the repository and run train.py on your own dataset, with your data.yaml configuarion file data.yaml is the explanation of train and val directories, number of classes and name of class
- your dataset should be customized to data format of YOLO (image, labels) pair of image and text file with same name. label coordinate should be (xcenter, ycenter, w, h)
- Roboflow can modified your dataset to YOLO format you need to upload pairs of images and labels, and choose the architeture(YOLOv7 pytorch) and then roboflow will do every thing for you inside your workstation in roboflow(which is with limitation for free account), you can also do whatever augmentation you want you can also split data to train/val/test split thereafter, you can download the modified dataset to your local machine or use it on google colab
- the zip file consist of one data.yaml file that should be used during model training
it was the dataset part
-
Train on custom dataset
- train.py also need a pretrained yolov7 model to finetune on it. you can download it from their github page now run the command line and weight for model to train [Note: you may encounter an error that all tensors are not on the same device--> there is a modification on loss.py file that you should do, just search it on issue of their github page]
- after finishing model training some checkpoint of the model is saved in run/train/exp/weighs directory.
- chooose the best.pt as your model for inferencing.
- you can download it to your local machine to run the model on cpu
- inference either can be done on cpu or gpu
- for inference you can run command line detect.py based on the saved model and image data you want to do detection[this case bounding boxes coordinates can be saved in a txt file]
- if you get txt file(NMS is already applied, you need to scale) or your can use your saved model in a python file, then you can do further processes (after applying model you should do NMS yourself and scale)
- the bounding box needs to be scaled to the original image sizes , since the results are normalized
- you can change the coordinates from (xc,yx,w,h)-> (xmin, ymin), (xmax,ymax)->draw rectangle on image or crop image for further analysis in OCR stage
Note:
since the output BB, maybe not alligned horizontally, you need to do it manually (detection lines->longest line->its orientation->rotate the image)
there are also some ractification network for doing it in OCR papers
Now the cropped plate number will be fed to OCR
-
publicly available OCR (they are not good for persian, specially for alligned text)
-
based on youtube :
-
synthesize a dataset for every persian characters and numbers, create a classification model, train the model in this dataset and use the trained model for test
-
crop the rectangle bounding box(cropped BB) in different propopotions
[(40, 120), (100, 200), (180, 280), (270, 360), (350, 400), (400, 460), (460, 530), (530, 600)] 12|پلاک : 10ب 234. (8 total character)
-
consider every piece as a 28*28 input to the classification network and predict and concate the predictions for all character
-
-
our own trained OCR
- based on CTC
- based on transformer
- based on attention
-
what if i want to code in tensorflow?
my answer:as the github repo for yolo is in pytorch, it is better to train the model with command line in torch, save the model(.pt) and export it to .onnx and then convert it to tensorflow model then i can use onnx, onnxruntime to do inference on new images.
-
what if aspect ratio of rectangle is too high or large scale images[it is not the case for license plates] and small scale objects?
my answer:if using YOLOv7, try to integrate sahi algorithm[vision-library-for-performing-sliced-inference-on-large-images-small-objects:https://github.com/obss/sahi] to yolov7 yolov8 already solve this problem and no need to sahi
-
challenges in tracking(fast moving car):
i think deepSORT is for realtime and already solve it
-
is there any improvement in yolo?
my answer:i searched and found by integrating transformer architecture on yolo backbone, there would be improvement
-
challenge of limited dataset size:
my answer: yolo models already use some augmentations like mosaeic,.. we did some augmentation while using roboflow, and we can use more of those available options can use GAN to generate images? i dont know if it is applicable for this problem
- on mobile device: convert model by tflite and use android studio for android coding and swift for iOS coding
- on server : convert to tf.serving
- onwindows: you can use c# and Microsoft.ML
- on openVINO
- on jetnano
- an API with django (python web framework API)
- kerize your application