https://github.com/Alleria1809/dsci560_app.git
Crawl information and attributes from Yelp using selenium.
Exploratory Data Analysis steps for collected data, e.g. encoding, statistical analysis, plotting, and other visualizations.
Use different models to predict the risk levels.
Read data from the LA open dataset and Yelp crawled data. Use RLTK package to handle the two datasets. Apply Blocking and Entity Linking techniques to combine the data.
Run PCA to reduce the dimension. Run KMeans to cluster the data. Use t-SNE to generate 2-D visualizations. Apply LDA topic modeling to detect keywords of the restaurant comments in each cluster.
Use TensorFlow framework to build neural network models for multiclass classification.
Generate tag sets for each restaurant. Compute Jaccard similarities. Recommendation algorithms for both recommendation functions - inputting features & inputting name.
https://drive.google.com/file/d/1i-z4BUMXxMZFXgBARAiYcsB-Vs2owMNM/view?usp=sharing
Please refer to the Final_Presentation.pdf