If cloning the pipeline from a private repository from a private GitLab instance, you first need to configure the SCM file for the Gitlab instance with your credentials. See Nextflow's documentation for Private server configuration for more information.
The typical command for running the pipeline is as follows:
nextflow run sdsc-ordes/nds-lucid-graphdb-loader
When you run the above command, Nextflow automatically pulls the pipeline code from Github and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
nextflow pull sdsc-ordes/nds-lucid-graphdb-loader
It is good practice to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since.
First, go to the nds-lucid-graphdb-loader tags page and find the latest pipeline version - numeric only (eg. 1.3.1
). Then specify this when running the pipeline with -r
(one hyphen) - eg. -r 1.3.1
. Of course, you can switch to another version by changing the number after the -r
flag.
This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future.
The pipeline uses the following arguments:
--input_dir
- Path to the directory containing input RDF files.--graphdb_dir
- Path to the GraphDB import directory. This is set withgraphdb.workbench.importDirectory
in the GraphDB configuration and must be available on the GraphDB host filesystem.--graphdb_url
- URL of the GraphDB instance to be used.--graphdb_repo
- Name of the GraphDB repository.--backup_dir
- Path to the directory where the RDF files will be backed up after successful loading.--log_dir
- Path to the directory where log files are generated.--max_backups
- Number of backup files to retain (n last runs).
In addition, the pipeline use nextflow secrets for the GraphDB repository credentials:
GRAPHDB_USER
- Username for the GraphDB repository.GRAPHDB_PASS
- Password for the GraphDB repository.