DBpedia is building public data infrastructure for a large, multilingual, semantic knowledge graph! They have a ton of data sets, and as with most data sets, they could use some cleaning.
To help DBpedia, we're going to audit one of their data sets for cleanliness.
Specifically, we'll take in the autos.csv
file and output the already clean data into autos-valid.csv
and then output data that needs fixing into FIXME-autos.csv
.
You'll need to install:
autos.csv
- Auto Data for processing
audit_production_start_year.py
- Takes in theautos.csv
and outputs already valid data intoautos-valid.csv
, the data that needs to be cleaned will go intoFIXME-autos.csv
process_file(input_file, output_good, output_bad)
- Takes in the auto datas and separates the data into valid and FIXME files.
autos-valid.csv
- Valid Auto DataFIXME-autos.csv
- Data that needs fixing