Classification of textual descriptions in cultural heritage records

Binary classification task to automatically identify high-quality and low-quality descriptions in cultural heritage records.

Resources

converter.py:creates vector data in .tsv format
FastText folder:
- prepare4fastTextClassifier.py:
  - converts the input .csv file in FT format (i.e __label__Good).
  - splits the input dataset in folds, by default 10. (--fold option)
- fasttextClassifier.py
  - classifies the descriptions (*.csv.fbclass file) and returns a file (.eval.gz) with the classification report.
- evaluate.py
  - evaluates the classification task results from the *.eval.gz file
LibSVM folder:
- K-fold.py
  - runs the K-foldvalidation on the .tsv created by the converter.py
learning_curve folder:
- estimator.py
  - computes the F1 score by using different splits
- learning_curve.py
  - plot the learning curve

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
FastText		FastText
LibSVM		LibSVM
learning_curve		learning_curve
LICENSE.txt		LICENSE.txt
README.md		README.md
converter.py		converter.py