- Garbage segmentation for Varanasi City with Drones using Computer Vision techniques.
- Dataset included 1000+ drone images of Varanasi city annotated manually with Label Studio. Used patchify to tackle dataset and U-Net architecture for binary semantic segmentationof garbage patches.
- Final reported model had binary cross entropy of 0.1512 and IOU score of 0.739.
- Deep learning techniques are used in the semantic segmentation process, and training the model on a large amount of data is necessary. So, using a drone (UAV), we took over 1000 pictures of Varanasi, India, from various angles to train the model. To train the model to recognize garbage, we carefully chose photographs with clear garbage. There were more than 300 appropriate photographs as a result of the manual selection procedure. *Garbage was manually detected and annotated using Label Studio in all the images and then annotations were converted to JSON Min format. Next we converted the json files to jpg files to obtain the mask images.
- Patch-based models are a type of machine learning model that divides an image into smaller sub-images or "patches" and then trains the model on these smaller images. This approach is particularly useful in cases where images are too large to process as a whole or when different regions of the image require different levels of analysis. In patch-based models, each patch is typically fed through a feature extraction algorithm that extracts relevant features from the image. These features are then used to train the model to recognize specific patterns or features in the image data. The model can then be used to make predictions on new images by dividing them into patches and applying the trained model to each patch.
- The U-Net architecture consists of an encoder and a decoder network. The encoder network is a series of convolutional and pooling layers that extract hierarchical features from the input image. The decoder network is a series of up sampling and convolutional layers that reconstruct the output image from the feature maps generated by the encoder.
- The key feature of the U-Net architecture is its skip connections, which connect corresponding layers of the encoder and decoder networks. These skip connections allow the decoder network to leverage both low-level and high-level features from the encoder network, enabling accurate segmentation of objects of different sizes and shapes.
- The U-Net architecture also includes several modifications, such as the use of "padding" in convolutional layers to preserve image resolution and the incorporation of batch normalization to speed up training.
- The U-Net architecture which we have used in this model uses binary segmentation in which garbage is considered as 1 (foreground) and rest all is considered 0 (background).
- In this collab notebook, we loaded the image patches and examined the labels. We discovered that the annotated images had multiple labels, which was not desirable and caused our previous training attempts to fail. To overcome this issue, we transformed the labels into binary labels representing background and foreground (0 and 1, respectively). We then used a data loader to load the data for training the model.
- The training process involved using input images and their corresponding segmentation images to train the network with the Adam optimizer. Both inputs and labels were subjected to the same transformation to ensure proper alignment. The model was trained using 50 epochs and a batch size of 16. Binary cross-entropy was used as the loss function, and IOU score was employed as the metric for evaluating the model's performance.
- The model is performing well and predicting garbage from images; now, our task will be to do the post- processing part and improve the model IOU score. This can be done by:
- Model Architecture: Experimenting with different model architectures or variations of existing architectures. This could involve increasing the depth or width of the network, introducing skip connections, or utilizing pre-trained models for transfer learning.
- Data Augmentation: Applying data augmentation techniques to increase the diversity of the training data. This can include random rotations, scaling, flipping, or adding noise to the input images and corresponding labels. Augmenting the data can help the model generalize better and improve its performance.
- Training Data Quality: Ensuring the quality and accuracy of the training data. Inaccurate or mislabeled annotations can negatively impact the model's performance. Review and validate the training data to minimize any errors or inconsistencies.
- Environment: This model will help detect the garbage hotspots hence helping in effective and targeted cleaniliness.
- Health: As garbage hotspots are breeding ground for various disease causing organisms so locating them would make it easier to eliminate them.
- Resource Saving Accurate identification of garbage filled areas would help in minimum wastage of resources and data based model would help study patterns of garbage deposition over an area or tell area with high waste generation etc.