Project made in collaboration with Ben Chen https://github.com/benbenbang
- kkbox.py ( as
main
) - README.md
- tools
- dataIO.py
- helperOutput.py
- wrangling.py
- requirement.txt
- README.md
-
Python Version:
python3
-
Run
python3
,jupyter notebook
or set default directory underpy_script
folder -
For easier to use
main
:- for csv files: place them in
../data
- Concated csv files
- kkbox.csv
- kkbox_test.csv
- Raw csv files
- train.csv
- test.csv
- songs.csv
- members.csv
- song_extra_info.csv
- Concated csv files
- for pickle files: place them in
..data/pickle
- train_sparse.pickle
- test_sparse.pickle
- target.pickle
- for xgboost model weights: place them in
..data/models
- xgbt_nn.model (nn is number, ex. 01, 02, ...)
- for csv files: place them in
-
Basically, we will need at least
kkbox.csv
andkkbox_test.csv
. If missing one or more of them, useimportAndMergeCSV('train')
andimportAndMergeCSV('test')
to getpd.DataFrame
. Don't forget to save one copy to save your time for next loading. -
Useful functions in
tools
-
In
dataIO
:importAndMergeCSV(type_)
importAndMergeHDF5(type_)
loadPickle(path)
savePickle(data, path)
-
In
wrangling
loadAndPreprocess(csv_path=None, to_train_mode=False, to_test_mode=False)
getFreqOfTarget(df)
-
In
helperOutput
outputHelper(model, X_test_sparse=None, load_test_set=False)
loadModelHelper(version)
-