- Name: Alisetti Sai Vamsi
- Organisation: DSC-IEM
- Project: TextSentimentAnalysis
- Transformer Model Construction
- Voice Recognition
- CLI Tool
- KeyWord Extraction
- Voice Recognition on frontend in js
- CLI Tool with the Transformer Model
- KeyWord Extraction with spacy
- DistilBert Trained on Amazon Food Review Dataset
- Connecting flask to streamlit for lighter api inference
-
Made use of the WebAPI of the chrome v8 browser engine to create a speech recognition script that transforms audio signal to text.
Pull Requests:
- Speech to text #17: khanfarhan10/TextSentimentAnalysis#17
-
Integrated streamlit with flask using the requests library of python transforming streamlit as a frontend framework and flask web api as a backend microservice.
Pull Requests:
- Streamlit to flask connection #16: khanfarhan10/TextSentimentAnalysis#16
-
For the dataset my primary selections were Standford Sentiment Treebank or the IMDB Dataset. But upon the suggestion from my peers and further research, I have chosen the amazon food reviews dataset since it has a good amount of variance and its a huge dataset. Since the training model was told to be lightweight I chose the distilbert transformer model leveraging the huggingface transformer library.
-
Training was done in pytorch, and the relevant hyperparameters are included in the training script. For more details please look into the training script.
Training Script: https://colab.research.google.com/drive/1ejVSWQng9chJRoqWprfxTnZnsKAcRFT-?usp=sharing
-
Dependency parsing and pos tagging using the spacy library has been implemented and integrated into streamlit.
Pull Requests:
- Inference api #27: khanfarhan10/TextSentimentAnalysis#27
-
Leveraged T5 for performing paraphrasing and used extractive summarization technique for summarization task. The summarization task, selectes sentences in a paragraph and calculates sentence scores based on the token score of each word in the sentence and picks the top 20 percent of the sentence with high score.
Pull Requests:
- Model Utility #32: khanfarhan10/TextSentimentAnalysis#32
-
Made use of the same model inferences, and created a CLI application in python which can process huge csv files, and output csv files with annotated labels. This can be useful for dataset generation and processing huge chunks of data.
Pull Requests:
- Cli #30: khanfarhan10/TextSentimentAnalysis#30
- Small changes to the CLI #31: khanfarhan10/TextSentimentAnalysis#31
- Flutter UI to me made matching the aesthetics of the web UI and also having the same functionality.
Overview
-
Made use of the WebAPI of the chrome v8 browser engine to create a speech recognition script that transforms audio signal to text.
Pull Requests:
- Speech to text #17: khanfarhan10/TextSentimentAnalysis#17
-
Integrated streamlit with flask using the requests library of python transforming streamlit as a frontend framework and flask web api as a backend microservice.
Pull Requests:
- Streamlit to flask connection #16: khanfarhan10/TextSentimentAnalysis#16
-
For the dataset my primary selections were Standford Sentiment Treebank or the IMDB Dataset. But upon the suggestion from my peers and further research, I have chosen the amazon food reviews dataset since it has a good amount of variance and its a huge dataset. Since the training model was told to be lightweight I chose the distilbert transformer model leveraging the huggingface transformer library.
-
Training was done in pytorch, and the relevant hyperparameters are included in the training script. For more details please look into the training script.
Training Script: https://colab.research.google.com/drive/1ejVSWQng9chJRoqWprfxTnZnsKAcRFT-?usp=sharing
-
Dependency parsing and pos tagging using the spacy library has been implemented and integrated into streamlit.
Pull Requests:
- Inference api #27: khanfarhan10/TextSentimentAnalysis#27
-
Leveraged T5 for performing paraphrasing and used extractive summarization technique for summarization task. The summarization task, selectes sentences in a paragraph and calculates sentence scores based on the token score of each word in the sentence and picks the top 20 percent of the sentence with high score.
Pull Requests:
- Model Utility #32: khanfarhan10/TextSentimentAnalysis#32
-
Made use of the same model inferences, and created a CLI application in python which can process huge csv files, and output csv files with annotated labels. This can be useful for dataset generation and processing huge chunks of data.
Pull Requests:
- Cli #30: khanfarhan10/TextSentimentAnalysis#30
- Small changes to the CLI #31: khanfarhan10/TextSentimentAnalysis#31
-
Minimalistic Flutter UI made and has been integrated with flask backend.
Some of the new features were:
-
CLI Tool
-
Flutter Application
-
Voice Recognition
-
Transformer models for inference
There are so many avenues that this project can take up. These are some of the following which I consider to be plausible and nice:
-
Deploying the flask server onto google cloud or aws for better resources.
-
Publishing the flutter application and enhancing its UI.
-
Improving the CLI Tool further and packaging it to PIP.
-
Improving the current web application by using a robust frontend framework like react or angular.
The overall program was very intriguing and gave enough time for practical implementation. Also this helped learn new things and has kept me out of my comfort zone. This also grew my network, and I made some really nice friends. I want to thank the mentors especially for putting up with us. A huge shout out to them and a special shout out to Farhan for putting up with me.