Overview:
Review Miner: Trending Feature Analyzer is a tool that helps users identify the most trendy features of a product in each of the recent years. This Trending Feature Analyzer includes five categories of products’ trendy features in past ten years. Users can simply selected desired category and view the results summarized in a table. This analyzer also support user customized data input by taking command line arguments. To use this software with customized category, a JSON file containing reviews and time information will be required.
Usage:
--Regular usage:
-
Run ui.py
-
Choose a category of products from the drop down menu, a table with trending features of the selected category of products over time will be displayed.
--Extended usage:
If users would like to use ReviewMiner with customized categories of products, dataset with time and reviews in JSON format will be needed.
- Run read.py with command : ./read.py [-category] [--path to reviews JSON folder]
(Note path is an optional arg, if the JSON file is already in default folder, i.e. AmazonReviews, path is not necessary)
-
Run processes.py with command: ./processes.py [-category]
-
Run features.py with command: ./features.py [-category]
-
Run ui.py with command: ./ui.py [-category]
Test case: 1. download package and unzip laptops.zip in the package folder
2. ./read.py laptops .
3. ./processes.py laptops
4. ./features.py laptops
5. ./ui.py laptops
Implementation:
-
We collect Amazon reviews of six categories of products in JSON format. We only use five of them to perform training while leave one category as a test case for extended usage mentioned above. The five categories are tablet, camera, TV mobilephone, and video surveillance. The category laptop will be used as a test case for extended usage.
-
For each category, we parse the JSON and separate the reviews by year. This step corresponds to the code in read.py. This step also supports reading and parsing file from user-customized file path and category.
-
By taking user specified category as an input, processes.py performs LDA on each year’s review data of the selected category. We use Gensim library in this step.
-
In features.py, we defined our filter_words list, which is used to filter out frequent words with no meaningful information in reviews. This list serves the similar purpose as stop words list, but stop words removing is done before performing LDA. This list is created after examining our dataset. For general dataset and analysis, words that expressing personal opinion may be important and not removed when perform top modelling. However, for the purpose of analyzing the trending features products from reviews, words such as “love, like, great, good, bad” may not indicate any feature of the product despite showing strong personal preference of the users.
-
After filtering the result features, features.py write the output to a csv file for GUI to use.
-
In ui.py, we implement a basic GUI for users to choose the category they want to analyze from a drop-down menu. Once a category is selected, a table including the trendy features of the selected category of products with corresponding time will show up.
-
If user choose to use the extended mode by providing a path to JSON and a new category, our GUI handles that by adding a new category to the bottom of the option menu, so user can access the newly included result after running read.py, processes.py, and features.py.
Group member contribution:
panqiut2: read.py, processes.py, features.py, video presentation
mengdig2: processes.py, ui.py, video presentation, documentation