Skip to content

Evaluating the GCIS project's Provenance


Notifications You must be signed in to change notification settings


Repository files navigation

GCIS Provenance Evaluator

Creating a robust process for evaluating the GCIS project.


Communication of the provenance provided for the primary publications in GCIS.
Primary Publications: nca3, Impacts of Climate Change on Human Health, Indicators, and others.

Secondary Objective

Informing decisions on where to direct data management work by identifying weak or broken provenance that can be improved



Perl v5.14 or higher

apt-get -y update && apt-get -y install perlbrew
perlbrew init
echo "source ~/perl5/perlbrew/etc/bashrc" >>~/.bashrc
source ~/.bashrc
perlbrew install perl-5.20.3 # Takes about 25 minutes!
perlbrew install-cpanm
perlbrew install-patchperl
perlbrew switch perl-5.20.3

CPAN Modules

These modules are required, install via cpanm:

cpanm install Getopt::Long Pod::Usage YAML::XS Data::Dumper Clone::PP Time::HiRes Path::Class JSON::XS Mojo::UserAgent

If you run into any error that mentions a missing module, install it similarly with: cpanm install Module::Name.

GCIS Repos Setup

Required repos:

GCIS Scripts :
GCIS Perl Client :
GCIS Provenance Evaluator :

Clone these and add their lib directories to your local PERL5LIB

Initial Setup:

mkdir ~/repos                    # only if you have no existing 'repos' directory
cd ~/repos
ls                               # check 'gcis-scripts' and 'gcis-pl-client' exist
git clone
git clone
git clone
ls                               # should see all three now
echo "export PERL5LIB=$PERL5LIB:~/repos/gcis-pl-client/lib:~/repos/gcis-scripts/lib/" >>~/.bashrc
. ~/.bashrc

Refresh the repos if it's been several months!

cd ~/repos/gcis-scripts
git pull
cd ~/repos/gcis-pl-client
git pull
cd ~/repos/gcis-provenance-evaluator
git pull

See the scripts documentation & examples:

cd ~/repos/gcis-provenance-evaluator
perldoc ./

Score Format

To Document

Component Format

To Document

Project Usage

  1. Establish Scores & Configuration
    1. Establish a scoring metric for each GCIS resource
      1. See default example, format
    2. Establish a scoring metric for each GCIS connection
      1. See default example, format
    3. Establish the components for each GCIS resource
      1. See defaut example, format
  2. Generate the score tree
    1. Decide how many levels deep you want to analyse your resource.
    2. Select your GCIS instance to run against (or load the pertinent database dump into a local instance) (default production).
    3. Run the command to generate the tree: See next section.
    • Name the trees in an informative way and store them somewhere safe!
  3. Run analysis on the Score Tree
    1. See each evaluation folder.

Generating Scores

Note 1: This script is long running on larger chapters! To be safe, you should run it in a screen session on a long-living server to prevent interruptions. For our purposes at USGCRP, I suggest running this script on data-review.

Note 2: This process is likely to generate a multitude of output trees we will want to keep track of. I strongly encourage a strict naming convention: "REPORT_CHAPTER_COMPONENT_SCORING_metrics.yaml".

  • So, if you were to run this on the Executive Summary of the NCA3 with the default scores:
    • "nca3_ch1_defaultscoring_metrics.yaml"
  • Or to use your custom scoring file "super_strict_scores.yaml" on the CSSR Chapter Temperature Changes in the United States Figure 6.1:
    • "cssr_ch6_fig1_superstrict_metrics.yaml"
    • commit that scoring file!

If you want to use the default scores and configuration, the process is pretty straightforward:

screen -DRS "metrics screen"         # creates or reconnects to the screen named "metrics screen"
cd ~/repos/gcis-provenance-evaluator
./ \
  --resource /report/nca3/chapter/executive-summary \
  --tree_file ./nca3_ch1_defaultscoring_metrics.yaml

Running with all the custom options:

screen -DRS "metrics screen"     
cd ~/repos/gcis-provenance-evaluator
./ \ 
  --resource report/usgcrp-climate-human-health-assessment-2016/chapter/extreme-events \ 
  --tree_file ./hhs2016_ch4_newscoring_newcomponents_metrics.yaml \ 
  --url \ 
  --depth 2 \                                 # WARNING - increasing depth can potentially make the run _exponentially_ longer!!!
  --connection_score /tmp/new_scores.yml \ 
  --internal_score /tmp/new_inner_scores.yml \ 
  --components /tmp/comps.yml

Creating JSON for sunburst

Put the generated yaml file in the root directory. Make sure the above cpan modules are installed and run:

./   --tree_file  nca4_ch22_butterfly_1219.yaml   --d3_file  nca4_ch22_butterfly_1219.json