This repository consists of the items below:
- Raw data file - "project1-commitsRefactoring.xlsx" - Consists the base raw data excluding the duplicates (entire row occuring more than once)
- data_train.csv - Train data extracted from Raw data file
- data_test.csv - Test data extracted from Raw data file
- dataone_train.csv - Train data extracted from data_train.csv consisting only single refactoring labels
- datamulti_train.csv - Train data extracted from data_train.csv consisting only multi refactoring label
- x_data.csv, y_data.csv, pred_data.csv - files used to analyse the fp and fn cases to understand what is going wrong here
Note: This repository contains several experimentation codes. For Phase-3 of the project, refer to prod.py and test.py
Steps for Execution:
- Run test.py -> This executes prod.py file which contains the implementation. Currently this code executes dataone_train.csv (single class data)
- To run multi-class - comment out line #105 and uncomment out line #108 to execute datamulti_train.csv (multi class data)