Binary classification task to automatically identify high-quality and low-quality descriptions in cultural heritage records.
- Python3
- Pandas
-
converter.py:creates vector data in .tsv format
-
FastText folder:
- prepare4fastTextClassifier.py:
- converts the input .csv file in FT format (i.e __label__Good).
- splits the input dataset in folds, by default 10. (--fold option)
- fasttextClassifier.py
- classifies the descriptions (*.csv.fbclass file) and returns a file (.eval.gz) with the classification report.
- evaluate.py
- evaluates the classification task results from the *.eval.gz file
- prepare4fastTextClassifier.py:
-
LibSVM folder:
- K-fold.py
- runs the K-foldvalidation on the .tsv created by the converter.py
- K-fold.py
-
learning_curve folder:
- estimator.py
- computes the F1 score by using different splits
- learning_curve.py
- plot the learning curve
- estimator.py