My Data_Linkage and Classification Project
- Data Linkage on the stocks of Google and Amazon based on their name, description and price
- Using the library like
fuzzywuzzy
andtextdistance
for the data linkage - Using the idea of Blocking to make the linkage part more efficient and with higher accuracy
- Comparing three the accuracy difference Classsfication Algo,
decision tree
,k-nn(n = 5)
andk-nn(n = 10)
- Feature engineering and selection
- Interaction term pairs and Clustering label
- Principal Component Analysis
- Naive choosing the first four features