Looper v1.5.0 release (#388)

* add geofetch tutorial * Add pephub support for sample-level pipeline interface. * Fixed tests naming * added project level for pephub support * simplify Project constructor implementation * logger string fix Co-authored-by: Vince <vince.reuter@gmail.com> * fix try block Co-authored-by: Vince <vince.reuter@gmail.com> * Update looper/utils.py Co-authored-by: Vince <vince.reuter@gmail.com> * Fixed #341 #342 * Deprecated write_skipped_sample_scripts. Scripts will now output sequentially (unless toggled). #173 * Remove redundant pooling behavior for skipped samples. #173 * First working version of #344 * divvy reintegration #343 * divvy reintegration #343 * Cleaned up looper with divvy * removed sample and project pipelines from cli * fixed old .looper.config specification * added new config docs * Remove --toggle key. Add logic during fetch samples for toggle. Remove checking for toggle off during sample submission (redundant). #263 * Remove toggle key constants. Apply formatting. #263 * Remove toggle key property. Simplified logic for fetching samples. #263 * fix pephub failing tests * cleaned up more to help pass pytests * distutils.dir_util for shutil for python 3.12 * added divvy entry point inside looper * black format * another black reformat * added docs * clean up based on feedback * clean up based on feedback * removed redundancy * black fmt * added divvy inspect * black fmt * added sub cmd, docker args * added line break for inspect output * added divvy docs #343 * added divvy docs #343 * divvy docs fix * mkdocs fix * Fixed mkdocs error * Update requirements-doc.txt * updated reqs-doc * merge mistake fix * added divvy imgs * added new looper init * added to changlog, fix divvy imgs * divvy readme img fix * fixed initialization of generic piface * fixed initialization of generic piface * added tests * fixed main setup * Update how_to_define_looper_config.md * Update __init__.py * Update test_other.py * Ise * added changelog and minor naming changes * remove old logging function * dev version bump * fix typo in html_report and upgraded pandas requirements for pephubclient * fixed requirements * fixed docs requirements * added versioneer to doc requirements * added Cython to doc requirements * added readthedocs config * added looper to requirements docs * allow for using pipestat.summarize, align with pipestat 0.4.0 * clean up code, update usage doc * update doc requirements pephubclient * downgrade docs to 3.10 * adjust get_status to use proper sample_name if pipestat configured #326 * adjust conductor to retrieve pipestat manager variables with pipestat 0.4.0 refactoring. * Allows skipping some tests if run offline. Closes #370 * work on using test_args instead of subprocesses * Finish switching applicable tests away from subprocess * Lint and update doc string to test_args_expansion * Change return type. * lint * add test for var_templates #357, and clean up tests * attempt simple check to see if provided pipelines are callable #195 * minor adjustments, polished docstring * update changelog * lint * update version to 1.5.0 * update changelog * update reqs and changelog to use pipestat v0.5.0 * Refactoring for looper config * added looper config file argument * code fix * Added comment about deprecating for old looper specification * fixed looper init error * change logo for docs build tab icon * fix favicon * update docs and changelog for 1.5.0 release --------- Co-authored-by: nsheff <nsheff@users.noreply.github.com> Co-authored-by: Khoroshevskyi <sasha99250@gmail.com> Co-authored-by: Vince Reuter <vince.reuter@gmail.com> Co-authored-by: ayobi <17304717+ayobi@users.noreply.github.com>
pepkit · Aug 9, 2023 · 6ac3b12 · 6ac3b12
1 parent 9942106
commit 6ac3b12
Show file tree

Hide file tree

Showing 90 changed files with 7,011 additions and 442 deletions.
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,19 @@
+# Read the Docs configuration file for MkDocs projects
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the version of Python and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.10"
+
+mkdocs:
+  configuration: mkdocs.yml
+
+# Optionally declare the Python requirements required to build your docs
+python:
+  install:
+  - requirements: requirements/requirements-doc.txt
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -2,5 +2,7 @@ include requirements/*
 include README.md
 include logo_looper.svg
 include looper/jinja_templates/*
+include looper/default_config/*
+include looper/default_config/divvy_templates/*
 include looper/jinja_templates_old/*
 include looper/schemas/*
diff --git a/divvy_templates/localhost_bulker_template.sub b/divvy_templates/localhost_bulker_template.sub
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+echo 'Compute node:' `hostname`
+echo 'Start time:' `date +'%Y-%m-%d %T'`
+
+eval "$(bulker activate -e {BULKER_CRATE})"
+
+{
+  {CODE}
+} | tee {LOGFILE} -i
diff --git a/divvy_templates/localhost_docker_template.sub b/divvy_templates/localhost_docker_template.sub
@@ -0,0 +1,8 @@
+#!/bin/bash
+
+echo 'Compute node:' `hostname`
+echo 'Start time:' `date +'%Y-%m-%d %T'`
+
+{
+docker run --rm -it {DOCKER_ARGS} {DOCKER_IMAGE} {CODE}
+} | tee {LOGFILE} --ignore-interrupts
diff --git a/divvy_templates/localhost_singularity_template.sub b/divvy_templates/localhost_singularity_template.sub
@@ -0,0 +1,9 @@
+#!/bin/bash
+
+echo 'Compute node:' `hostname`
+echo 'Start time:' `date +'%Y-%m-%d %T'`
+
+{
+singularity instance.start {SINGULARITY_ARGS} {SINGULARITY_IMAGE} {JOBNAME}_image
+singularity exec instance://{JOBNAME}_image {CODE}
+} | tee {LOGFILE} --ignore-interrupts
diff --git a/divvy_templates/localhost_template.sub b/divvy_templates/localhost_template.sub
@@ -0,0 +1,8 @@
+#!/bin/bash
+
+echo 'Compute node:' `hostname`
+echo 'Start time:' `date +'%Y-%m-%d %T'`
+
+{
+{CODE}
+} | tee {LOGFILE}
diff --git a/divvy_templates/lsf_template.sub b/divvy_templates/lsf_template.sub
@@ -0,0 +1,4 @@
+#!/bin/bash
+
+bsub -n{CORES} -W {TIME} -R \"rusage[mem={MEM}]\" -o {LOGFILE} {CODE}
+
diff --git a/divvy_templates/sge_template.sub b/divvy_templates/sge_template.sub
@@ -0,0 +1 @@
+This has not been implemented, but you could add whatever cluster submission systems here, just use the slurm_template as an example.
diff --git a/divvy_templates/slurm_singularity_template.sub b/divvy_templates/slurm_singularity_template.sub
@@ -0,0 +1,17 @@
+#!/bin/bash
+#SBATCH --job-name='{JOBNAME}'
+#SBATCH --output='{LOGFILE}'
+#SBATCH --mem='{MEM}'
+#SBATCH --cpus-per-task='{CORES}'
+#SBATCH --time='{TIME}'
+#SBATCH --partition='{PARTITION}'
+#SBATCH -m block
+#SBATCH --ntasks=1
+
+echo 'Compute node:' `hostname`
+echo 'Start time:' `date +'%Y-%m-%d %T'`
+
+singularity instance.start {SINGULARITY_ARGS} {SINGULARITY_IMAGE} {JOBNAME}_image
+srun singularity exec instance://{JOBNAME}_image {CODE}
+
+singularity instance.stop {JOBNAME}_image
diff --git a/divvy_templates/slurm_template.sub b/divvy_templates/slurm_template.sub
@@ -0,0 +1,14 @@
+#!/bin/bash
+#SBATCH --job-name='{JOBNAME}'
+#SBATCH --output='{LOGFILE}'
+#SBATCH --mem='{MEM}'
+#SBATCH --cpus-per-task='{CORES}'
+#SBATCH --time='{TIME}'
+#SBATCH --partition='{PARTITION}'
+#SBATCH -m block
+#SBATCH --ntasks=1
+
+echo 'Compute node:' `hostname`
+echo 'Start time:' `date +'%Y-%m-%d %T'`
+
+{CODE}
diff --git a/docs/README_divvy.md b/docs/README_divvy.md
@@ -0,0 +1,66 @@
+![Logo](img/divvy_logo.svg)
+
+## What is `divvy`?
+
+`Divvy` allows you to populate job submission scripts by integrating job-specific settings with separately configured computing environment settings. Divvy *makes software portable*, so users may easily toggle among any computing resource (laptop, cluster, cloud). 
+
+![Merge](img/divvy-merge.svg)
+## What makes `divvy` better?
+
+![NoDivvy](img/nodivvy.svg)
+
+Tools require a particular compute resource setup. For example, one pipeline requires SLURM, another requires AWS, and yet another just runs directly on your laptop. This makes it difficult to transfer to different environments. For tools that can run in multiple environments, each one must be configured separately.
+
+<hr>
+
+
+Instead, `divvy`-compatible tools can run on any computing resource. **Users configure their computing environment once, and all divvy-compatible tools will use this same configuration.**
+
+![Connect](img/divvy-connect.svg)
+
+Divvy reads a standard configuration file describing available compute resources and then uses a simple template system to write custom job submission scripts. Computing resources are organized as *compute packages*, which users select, populate with values, and build scripts for compute jobs. 
+
+<br clear="all"/>
+
+Use the default compute packages or [configure your own](configuration.md).  See what's available:
+
+```{console}
+divvy list
+```
+
+```{console}
+Divvy config: divvy_config.yaml
+
+docker
+default
+singularity_slurm
+singularity
+local
+slurm
+```
+
+
+Divvy will take variables from a file or the command line, merge these with environment settings to create a specific job script. Write a submission script from the command line:
+
+```{console}
+divvy write --package slurm \
+	--settings myjob.yaml \
+	--compute sample=sample1 \
+	--outfile submit_script.txt
+```
+
+### Python interface
+
+You can also use `divvy` via python interface, or you can use it to make your own python tools divvy-compatible:
+
+```{python}
+import divvy
+dcc = divvy.ComputingConfiguration()
+dcc.activate_package("slurm")
+
+# write out a submission script
+dcc.write_script("test_script.sub", 
+	{"code": "bowtie2 input.bam output.bam"})
+```
+
+For more details, check out the [tutorial](tutorial).
diff --git a/docs/adapters_divvy.md b/docs/adapters_divvy.md
@@ -0,0 +1,18 @@
+# Adapters make template variables flexible
+
+Starting with `divvy v0.5.0` the configuration file can include an `adapters` section, which is used to provide a set of variable mappings that `divvy` uses to populate the submission templates.
+
+This makes the connection with `divvy` and client software more flexible and more elegant, since the source of the data does not need to follow any particular naming scheme, any mapping can be used and adapted to work with any `divvy` templates.
+
+## Example
+
+```yaml
+adapters:
+  CODE: namespace.command
+  LOGFILE: namespace1.log_file
+  JOBNAME: user_settings.program.job_name
+  CORES: processors_number
+...
+```
+
+As you can see in the example `adapters` section above, each adapter is a key-value pair that maps a `divvy` template variable to a target value. The target values can use namespaces (nested mapping).
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -2,6 +2,24 @@
 
 This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
 
+## [1.5.0] -- 
+
+### Added 
+
+- ability to use PEPs from PEPhub without downloading project [#341](https://github.com/pepkit/looper/issues/341)
+- ability to specify pipeline interfaces inside looper config [Looper Config](https://looper.databio.org/en/dev/how_to_define_looper_config/)
+- divvy re-integrated in looper
+- divvy inspect -p package
+- Looper will now check that the command path provided in the pipeline interface is callable before submitting.
+
+
+### Changed
+- initialization of generic pipeline interface available using subcommand `init-piface`
+- `looper report` will now use pipestat to generate browsable HTML reports if pipestat is configured.
+- looper now works with pipestat v0.5.0.
+- Removed --toggle-key functionality. 
+- Allow for user to input single integer value for --sel-incl or --sel-excl
+
 ## [1.4.3] -- 2023-08-01
 
 ### Fixed
@@ -14,12 +32,11 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 
 ## [1.4.1] -- 2023-06-22
 
-### Fixed
-- Upgraded Eido version to 0.2.0 or higher.
 
 ## [1.4.0] -- 2023-04-24
 
 ### Added
+
 - preliminary support for [pipestat](http://pipestat.databio.org).
 - ability to skip samples using  `-k` or `--skip` [#367](https://github.com/pepkit/looper/pull/367)
 - ability to input a range into `limit` and `skip`[#367](https://github.com/pepkit/looper/pull/367)

diff --git a/docs/configuration_divvy.md b/docs/configuration_divvy.md
@@ -0,0 +1,72 @@
+# The divvy configuration file
+
+At the heart of `divvy` is a the *divvy configuration file*, or `DIVCFG` for short. This is a `yaml` file that specifies a user's available *compute packages*. Each compute package represents a computing resource; for example, by default we have a package called `local` that populates templates to simple run jobs in the local console, and another package called `slurm` with a generic template to submit jobs to a SLURM cluster resource manager. Users can customize compute packages as much as needed. 
+
+## Configuration file priority lookup
+
+When `divvy` starts, it checks a few places for the `DIVCFG` file. First, the user may may specify a `DIVCFG` file when invoking `divvy` either from the command line or from within python. If the file is not provided, `divvy` will next look file in the `$DIVCFG` environment variable. If it cannot find one there, then it will load a default configuration file with a few basic compute packages. We recommend setting the `DIVCFG` environment variable as the most convenient use case.
+
+## Customizing your configuration file
+
+The easiest way to customize your computing configuration is to edit the default configuration file. To get a fresh copy of the default configuration, use `divvy init custom_divvy_config.yaml`. This will create for you a config file along with a folder containing all the default templates.
+
+Here is an example `divvy` configuration file:
+
+```{console}
+compute_packages:
+  default:
+    submission_template: templates/local_template.sub
+    submission_command: sh
+  local:
+    submission_template: templates/local_template.sub
+    submission_command: sh
+  develop_package:
+    submission_template: templates/slurm_template.sub
+    submission_command: sbatch
+    partition: develop
+  big:
+    submission_template: templates/slurm_template.sub
+    submission_command: sbatch
+    partition: bigmem
+```
+
+The sub-sections below `compute_packages` each define a *compute package* that can be activated. `Divvy` uses these compute packages to determine how to submit your jobs. If you don't specify a package to activate, `divvy` uses the package named `default`. You can make your default whatever you like. You can activate any other compute package __on the fly__ by calling the `activate_package` function from python, or using the `--package` command-line option.
+
+You can make as many compute packages as you wish, and name them whatever you wish. You can also add whatever attributes you like to the compute package. There are only two required attributes: each compute package must specify the `submission_command` and `submission_template` attributes. 
+
+### The `submission_command` attribute
+
+The `submission_command` attribute is the string your cluster resource manager uses to submit a job. For example, in our compute package named `develop_package`, we've set `submission_command` to `sbatch`. We are telling divvy that submitting this job should be done with: `sbatch submission_script.txt`.
+
+### The `submission_template` attribute
+
+Each compute package specifies a path to a template file (`submission_template`). The template file provides a skeleton that `divvy` will populate with job-specific attributes. These paths can be relative or absolute; relative paths are considered *relative to the DIVCFG file*. Let's explore what template files look like next.
+
+## Template files
+
+Each compute package must point to a template file with the `submission_template` attribute. These template files are typically stored relative to the `divvy` configuration file. Template files are taken by `divvy`, populated with job-specific information, and then run as scripts. Here's an example of a generic SLURM template file:
+
+```{bash}
+#!/bin/bash
+#SBATCH --job-name='{JOBNAME}'
+#SBATCH --output='{LOGFILE}'
+#SBATCH --mem='{MEM}'
+#SBATCH --cpus-per-task='{CORES}'
+#SBATCH --time='{TIME}'
+#SBATCH --partition='{PARTITION}'
+#SBATCH -m block
+#SBATCH --ntasks=1
+
+echo 'Compute node:' `hostname`
+echo 'Start time:' `date +'%Y-%m-%d %T'`
+
+srun {CODE}
+```
+
+Template files use variables (*e.g.* `{VARIABLE}`), which will be populated independently for each job. If you want to make your own templates, you should check out the default templates (in the [submit_templates](https://github.com/pepkit/divcfg/tree/master/templates) folder). Many users will not need to tweak the template files, but if you need to, you can also create your own templates, giving `divvy` ultimate flexibility to work with any compute infrastructure in any environment. To create a custom template, just follow the examples. Then, point to your custom template in the `submission_template` attribute of a compute package in your `DIVCFG` config file.
+
+
+
+## Resources
+
+You may notice that the compute config file does not specify resources to request (like memory, CPUs, or time). Yet, these are required in order to submit a job to a cluster. **Resources are not handled by the divcfg file** because they not relative to a particular computing environment; instead they vary by pipeline and sample. As such, these items should be provided elsewhere. 
diff --git a/docs/containers_divvy.md b/docs/containers_divvy.md
@@ -0,0 +1,76 @@
+
+# Configuring containers with divvy
+
+The divvy template framework is a natural way to run commands in a container, for example, using `docker` or `singularity`. All we need to do is 1) design a template that will run the job in the container, instead of natively; and 2) create a new compute package that will use that template.
+
+## A template for container runs
+
+If you start up divvy without giving it a DIVCFG file, it will come with a few default compute packages that include templates for containers. You can also find these in [the divcfg repository](http://github.com/pepkit/divcfg), which includes these scenarios:
+
+- singularity on SLURM
+- singularity on localhost
+- docker on localhost
+- others
+
+If you need a different system, looking at those examples should get you started toward making your own. To take a quick example, using singularity on SLURM combines the basic SLURM script template with these lines to execute the run in container:
+
+```
+singularity instance.start {SINGULARITY_ARGS} {SINGULARITY_IMAGE} {JOBNAME}_image
+srun singularity exec instance://{JOBNAME}_image {CODE}
+singularity instance.stop {JOBNAME}_image
+```
+
+This particular template uses some variables provided by different sources: `{JOBNAME}`, `{CODE}`, `{SINGULARITY_ARGS}` and `{SINGULARITY_IMAGE}`. These arguments could be defined at different places. For example, the `{SINGULARITY_IMAGE}` variable should point to a singularity image that could vary by pipeline, so it makes most sense to define this variable individually for each pipeline. So, any pipeline that provides a container should probably include a `singularity_image` attribute providing a place to point to the appropriate container image.
+
+Of course, you will also need to make sure that you have access to `singularity` command from the compute nodes; on some clusters, you may need to add a `module load singularity` (or some variation) to enable it.
+
+The `{SINGULARITY_ARGS}` variable comes just right after the `instance.start` command, and can be used to pass any command-line arguments to singularity. We use these, for example, to bind host disk paths into the container. **It is critical that you explicitly bind any file systems with data necessary for the pipeline so the running container can see those files**. The [singularity documentation](https://singularity.lbl.gov/docs-mount#specifying-bind-paths) explains this, and you can find other arguments detailed there. Because this setting describes something about the computing environment (rather than an individual pipeline or sample), it makes most sense to put it in the `DIVCFG` file for a particular compute package. The next section includes examples of how to use `singularity_args`.
+
+If you're using [looper](http://looper.databio.org), the `{JOBNAME}` and `{CODE}` variables will be provided automatically by looper.
+
+## Adding compute packages for container templates
+
+To add a package for these templates to a `DIVCFG` file, we just add a new section. There are a few examples in this repository. A singularity example we use at UVA looks like this:
+
+```
+singularity_slurm:
+  submission_template: templates/slurm_singularity_template.sub
+  submission_command: sbatch
+  singularity_args: --bind /sfs/lustre:/sfs/lustre,/nm/t1:/nm/t1
+singularity_local:
+  submission_template: templates/localhost_singularity_template.sub
+  submission_command: sh
+  singularity_args: --bind /ext:/ext
+```
+
+These singularity compute packages look just like the typical ones, but just change the `submission_template` to point to the new containerized templates described in the previous section, and then they add the `singularity_args` variable, which is what will populate the `{SINGULARITY_ARGS}` variable in the template. Here we've used these to bind (mount) particular file systems the container will need. You can use these to pass along any environment-specific settings to your singularity container.
+
+With this setup, if you want to run a singularity container, just specify `--compute singularity_slurm` or `--compute singularity_local` and it will use the appropriate template. 
+
+For another example, take a look at the basic `localhost_container.yaml` DIVCFG file, which describes a possible setup for running docker on a local computer:
+
+```
+compute:
+  default:
+    submission_template: templates/localhost_template.sub
+    submission_command: sh
+  singularity:
+    submission_template: templates/localhost_singularity_template.sub
+    submission_command: sh
+    singularity_args: --bind /ext:/ext
+  docker:
+    submission_template: templates/localhost_docker_template.sub
+    submission_command: sh
+    docker_args: |
+      --user=$(id -u) \
+      --env="DISPLAY" \
+      --volume ${HOME}:${HOME} \
+      --volume="/etc/group:/etc/group:ro" \
+      --volume="/etc/passwd:/etc/passwd:ro" \
+      --volume="/etc/shadow:/etc/shadow:ro"  \
+      --volume="/etc/sudoers.d:/etc/sudoers.d:ro" \
+      --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
+      --workdir="`pwd`" \
+```
+
+Notice the `--volume` arguments, which mount disk volumes from the host into the container. This should work out of the box for most docker users.
diff --git a/docs/default_packages_divvy.md b/docs/default_packages_divvy.md
@@ -0,0 +1,6 @@
+# Default divvy compute packages
+
+Divvy comes with a built-in default configuration that provides a few packages and templates. You can configure your own with `divvy init` and then adding whatever you like. The defaults provided can be found at these links:
+
+- [list of available default packages](https://github.com/pepkit/divvy/blob/master/divvy/submit_templates/default_compute_settings.yaml)
+- [default templates](https://github.com/pepkit/divvy/tree/master/divvy/submit_templates)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@
		#!/bin/bash

		bsub -n{CORES} -W {TIME} -R \"rusage[mem={MEM}]\" -o {LOGFILE} {CODE}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		This has not been implemented, but you could add whatever cluster submission systems here, just use the slurm_template as an example.