This repository contains both datasets we have collected as well as scripts used to collect them.
Main folders: contain the source, a sensible cleanup, and then a folder for each distinct output.
- source_data: raw data sources from government and other state agency websites
- cleaned_data: source files tidied into cleaner formats for easier comparison
- analysis: a set of folders with workspace environments for each specific output
analysis/0001-use-of-force
: One directory per use case, first one given as an example.analysis/0001-use-of-force/README.md
: Description of this analysis, where it's used, author info, etc.analysis/0001-use-of-force/Makefile
: A makefile for generating this analysis (make
)analysis/0001-use-of-force/force-mappings.csv
: Mappings to make source metadata more descriptive and easier to readanalysis/0001-use-of-force/use-of-force.py
: Script to create the outputs in this directoryanalysis/0001-use-of-force/**/*
: Outputs generated by script
Utility folders
.github
: actions to test the pipelinesbibliography
: BibTeX files- pipelines: populates source directory, cleans data (run
make pipelines
to run them all)
If you have relevant datasets then we would like to include them here. We expect datasets to:
- Be automated where possible, with a script in the
scripts
directory - Come with Great Expectations test suites
- Be well documented with README files
Feel free to open a ticket or email [email protected] with any questions.
Tests are provided using Great Expectations. You will need a recent version of Python installed to use this. The rest of the dependencies can then be installed with:
- Run
python3 -m venv venv && source venv/bin/activate
to create a virtual environment - Install the dependencies with
pip3 install -r requirements.txt
- Run
great_expectations init
to create any missing directories
To create a test suite for your new dataset run great_expectations suite new
To edit a test suite run great expectations suite edit police-population.warning
To run the tests and show the results run great_expectations docs build