Skip to content

Latest commit

 

History

History
166 lines (98 loc) · 7.65 KB

alisetti-sai-vamsi - Alisetti Sai Vamsi.md

File metadata and controls

166 lines (98 loc) · 7.65 KB
gsoc DSC-IEM

Winter of Code Final Work Product

Proposed Objectives

  • Transformer Model Construction
  • Voice Recognition
  • CLI Tool
  • KeyWord Extraction

Modified Objectives

  • Voice Recognition on frontend in js
  • CLI Tool with the Transformer Model
  • KeyWord Extraction with spacy
  • DistilBert Trained on Amazon Food Review Dataset
  • Connecting flask to streamlit for lighter api inference

Objectives Completed

  • Voice Recognition on frontend in js

    Made use of the WebAPI of the chrome v8 browser engine to create a speech recognition script that transforms audio signal to text.

    Pull Requests:

  • Connecting flask to streamlit for lighter api inference

    Integrated streamlit with flask using the requests library of python transforming streamlit as a frontend framework and flask web api as a backend microservice.

    Pull Requests:

  • Research on dataset and model selection

    For the dataset my primary selections were Standford Sentiment Treebank or the IMDB Dataset. But upon the suggestion from my peers and further research, I have chosen the amazon food reviews dataset since it has a good amount of variance and its a huge dataset. Since the training model was told to be lightweight I chose the distilbert transformer model leveraging the huggingface transformer library.

  • Training transformer model

    Training was done in pytorch, and the relevant hyperparameters are included in the training script. For more details please look into the training script.

    Training Script: https://colab.research.google.com/drive/1ejVSWQng9chJRoqWprfxTnZnsKAcRFT-?usp=sharing

  • Key Word Extraction

    Dependency parsing and pos tagging using the spacy library has been implemented and integrated into streamlit.

    Pull Requests:

  • Models for paraphrasing and summarization

    Leveraged T5 for performing paraphrasing and used extractive summarization technique for summarization task. The summarization task, selectes sentences in a paragraph and calculates sentence scores based on the token score of each word in the sentence and picks the top 20 percent of the sentence with high score.

    Pull Requests:

  • Built a CLI Tool for processing csv files

    Made use of the same model inferences, and created a CLI application in python which can process huge csv files, and output csv files with annotated labels. This can be useful for dataset generation and processing huge chunks of data.

    Pull Requests:

Objectives in Progress

  • Deployment to Android

    Flutter UI to me made matching the aesthetics of the web UI and also having the same functionality.

Developer - Winter of Code 2020

Alisetti Sai Vamsi

DSC - IEM : Text Sentiment Analysis

Overview

Contributions

  • Voice Recognition on frontend in js

    Made use of the WebAPI of the chrome v8 browser engine to create a speech recognition script that transforms audio signal to text.

    Pull Requests:

  • Connecting flask to streamlit for lighter api inference

    Integrated streamlit with flask using the requests library of python transforming streamlit as a frontend framework and flask web api as a backend microservice.

    Pull Requests:

  • Research on dataset and model selection

    For the dataset my primary selections were Standford Sentiment Treebank or the IMDB Dataset. But upon the suggestion from my peers and further research, I have chosen the amazon food reviews dataset since it has a good amount of variance and its a huge dataset. Since the training model was told to be lightweight I chose the distilbert transformer model leveraging the huggingface transformer library.

  • Training transformer model

    Training was done in pytorch, and the relevant hyperparameters are included in the training script. For more details please look into the training script.

    Training Script: https://colab.research.google.com/drive/1ejVSWQng9chJRoqWprfxTnZnsKAcRFT-?usp=sharing

  • Key Word Extraction

    Dependency parsing and pos tagging using the spacy library has been implemented and integrated into streamlit.

    Pull Requests:

  • Models for paraphrasing and summarization

    Leveraged T5 for performing paraphrasing and used extractive summarization technique for summarization task. The summarization task, selectes sentences in a paragraph and calculates sentence scores based on the token score of each word in the sentence and picks the top 20 percent of the sentence with high score.

    Pull Requests:

  • Built a CLI Tool for processing csv files

    Made use of the same model inferences, and created a CLI application in python which can process huge csv files, and output csv files with annotated labels. This can be useful for dataset generation and processing huge chunks of data.

    Pull Requests:

  • Flutter Application

    Minimalistic Flutter UI made and has been integrated with flask backend.

New Features

Some of the new features were:

  1. CLI Tool

  2. Flutter Application

  3. Voice Recognition

  4. Transformer models for inference

Future Scope

There are so many avenues that this project can take up. These are some of the following which I consider to be plausible and nice:

  1. Deploying the flask server onto google cloud or aws for better resources.

  2. Publishing the flutter application and enhancing its UI.

  3. Improving the CLI Tool further and packaging it to PIP.

  4. Improving the current web application by using a robust frontend framework like react or angular.

Overall Experience

The overall program was very intriguing and gave enough time for practical implementation. Also this helped learn new things and has kept me out of my comfort zone. This also grew my network, and I made some really nice friends. I want to thank the mentors especially for putting up with us. A huge shout out to them and a special shout out to Farhan for putting up with me.