Skip to content

Latest commit

 

History

History
31 lines (21 loc) · 1.82 KB

README.md

File metadata and controls

31 lines (21 loc) · 1.82 KB

ISMIR-2017-Discogs: Dataset, code for analysis and results

This repository contains:

Please, cite this paper if you are using our dataset and code.

See examples of metadata analysis that can be done using metadata from Discogs.

Pre-processed dataset of release metadata from Discogs

Code for data pre-processing and analysis

This is the code that we used to create our release dataset and for our example studies presented in the ISMIR-2017's paper.

Dependencies

Run pip install -r requirements.txt to install required dependencies.

Configuration

  • config.py: basic configuration script, contains some global variables (like filenames) used by other scripts

Dataset creation and analysis

  • preprocess_releases_xml_to_json.py: downloads the original XML dump archive and converts a subset of its metadata fields to a json dump.
  • preprocess_releases_json_to_hdf_pandas.py: further simplifies the metadata removing and recoding some fields, and outputs a HDF file with a pandas DataFrame.
  • analyze.py: a collection of useful functions for analysis of the dataset.