The-Great-Sentiment-Analyzer

Deployment Of Machine Learning Models Trained on Automatically Annotated Hindi and English Datasets

Code authors: Aarav Babu, Aditi Singh, Atul Krishnan

Introduction:

The popularity of E-Commerce has grown multifold across the years since its inception. With the rise in traffic in such websites, there has been a growing dependency on the views and opinions of the products available. Companies and customers greatly rely on these reviews for sales, market demands, constructive criticism, and to check the quality of the product being bought. In our project, we’re looking at ways to avoid the manual labor involved in labeling the dataset through automatic annotating frameworks and are producing a dataset of Hindi reviews to help in the field of Sentiment Analysis in Indian Languages.

We have successfully trained a highly accurate model on data which was automatically annotated to segregate these reviews into Suggestions and Complaints. The custom ML model was then deployed to identify and classify the given reviews (both for a huge number and for single reviews). We are using FLASK API to run our deployment. It is free for research use and if you find it useful or use it in your research, please acknowledge our git repository.

Getting Started

To see the working model of our deployment, first download the zip file under /Deployment and extract it to the required folder.
Open command prompt in the folder path.
First execute the command

pip install virtualenv

Create a new virtual environment using the command:

virtualenv abc

Alternate command:

python -m virtualenv abc

To enter the virtual environment, use the command:

abc\Scripts\Activate

where “abc” is the name of the virtual environment. 6. Once in virtualenv execute the command:

pip install -r requirements.txt

to download all the dependencies. 7. Then run the following commands:

set FLASK_APP=app.py (app name)

set FLASK_ENV=development

flask run

Copy and Paste the URL provided in the output screen onto any web browser.
Depending on the type of file you downloaded (Mass/Single), enter either a URL of the product with its page number (for eg: https://www.amazon.in/Vivo-Midnight-Additional-Exchange-Offers/product-reviews/B09Q5Z5M9D/ref=cm_cr_arp_d_paging_btm_next_2?ie=UTF8&reviewerType=all_reviews&pageNumber=2) or a single review (for eg: “The phone is fast enough for daily use. Software needs much more improvement seems like some settings for UI is missing. Search inside settings won't show any results even if they exists”) .

How to make your own .pkl file:

Go to the /Deployment folder and then open the deployingmodel.ipynb file.
Change the Dataset to your own dataset.
Drop/Select whatever columns you want.
Use a classifier and estimator of your choice.
Run the full code.
Download the .pkl file from outputs and use it in your own deployment.

About the Datasets, Automatic Annotation Tools and Models

The datasets have been provided under the /Datasets folder. All the reviews that we’ve used have been scraped by us using the python codes provided under the /Web Scraping folder. We have also created a one-of-a-kind Hindi/Hinglish balanced product reviews dataset that has also been web scraped (code provided) by us. It consists of 4331 reviews out of which 2284 were suggestions and 2047 were complaints.

We have provided the code for 3 Automatic Annotation Frameworks namely Vader, TextBlob, and Flair that we have used to label the datasets automatically under /AutomaticAnnotation folder. Our analysis across the 3 tools show that Flair was the better framework while TextBlob proved to be the fastest. Vader provided a good accuracy across datasets without much variance.

The Automatically Annotated Datasets were then tried on various standard models to see the accuracy with which predictions can be made. On comparing the accuracies obtained, with a Manually labeled dataset, they were around a similar area which was very good. Bagging + XGBoosting gave the best accuracy which we used in the model we eventually deployed.

Future Scope:

The field of opinion mining and sentiment analysis is among the most popular right now. In our study, we have shown the advantages of using Automatic Annotating Frameworks for labeling datasets to help ease the process, Published an exciting authentic Hindi/Hinglish review dataset that’ll help in further studies in SAIL. We have set the stage for a new set of studies in the field of Automatic Annotation in Indian Languages and helped build trust for it in English. We hope you like our project and it helps you to work more in this field.

Contact Us:

Atul Krishnan: [email protected]

           Github: AtulKrishnan

Aditi Singh: [email protected]

         Github: aditi-singh2

Aarav Babu: [email protected]

        Github: aarav-babu

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Automatic Annotation		Automatic Annotation
Datasets		Datasets
Deployment		Deployment
Model Training		Model Training
Web Scraping		Web Scraping
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The-Great-Sentiment-Analyzer

Introduction:

Table of Contents:

Getting Started

About the Datasets, Automatic Annotation Tools and Models

Future Scope:

Contact Us:

About

Releases

Packages

Languages

aditi-singh2/The-Great-Sentiment-Analyzer

Folders and files

Latest commit

History

Repository files navigation

The-Great-Sentiment-Analyzer

Introduction:

Table of Contents:

Getting Started

About the Datasets, Automatic Annotation Tools and Models

Future Scope:

Contact Us:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages