Skip to content

Commit

Permalink
Merge pull request #29 from galaxyproject/prep_for_release
Browse files Browse the repository at this point in the history
Prep for release
  • Loading branch information
nuwang authored Jun 15, 2022
2 parents b2a53ee + af0bb3f commit 2b86066
Show file tree
Hide file tree
Showing 4 changed files with 153 additions and 15 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
1.2.0 - Jun 15, 2022. (sha 872d200f3bfeb7356ba76bb1ee14134a50608d92)
--------------------------------------------------------------------

* vortex package and cli renamed to tpv for consistency.
* All matching entity regexes are applied, not just the first. Order of application is in the order of definition.
* When a particular entity type is matched, its definitions are cached, so that future lookups are O(1).
* Support for job resubmission handling, with integration tests for Galaxy,
* Allow destinations to be treated as regular entities, with support for rules and expressions.
* Support for global and local context variables that can be referenced in expressions.
* Improved support for complex jobs param types like dicts and lists, which are now recursively evaluated.

1.1.0 - Mar 25, 2022. (sha 0e65d9a6a16bbbfd463031677067e1af9f4dac64)
--------------------------------------------------------------------

Expand Down
45 changes: 32 additions & 13 deletions docs/topics/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ Concepts and Organisation
Object types
============

Conceptually, Vortex consists of the following types of objects.
Conceptually, TPV consists of the following types of objects.

1. Entities - An entity is anything that will be considered for scheduling
by vortex. Entities include Tools, Users, Groups, Rules and Destinations.
by TPV. Entities include Tools, Users, Groups, Rules and Destinations.
All entities have some common properties (id, cores, mem, env, params,
scheduling tags).

Expand Down Expand Up @@ -76,7 +76,7 @@ User > Role > Tool.

3. Evaluate
-----------
This operation evaluates any python expressions in the vortex config. It is divided into two steps, evaluate_early()
This operation evaluates any python expressions in the TPV config. It is divided into two steps, evaluate_early()
and evaluate_late(). The former runs before the combine step and evaluates expressions for cores, mem and gpus.
This ensures that at the time of combining entities, these values are concrete and can be compared. After the combine()
step, the evaluate_late() function evaluates all remaining variables, ensuring that they have the latest possible
Expand All @@ -102,7 +102,7 @@ candidate destinations.
Job Dispatch Process
====================

When a typical job is dispatched, vortex follows the process below.
When a typical job is dispatched, TPV follows the process below.

.. image:: ../images/job-dispatch-process.svg

Expand All @@ -112,17 +112,18 @@ When a typical job is dispatched, vortex follows the process below.
3. combine() - Combines entity requirements to create a merged entity. Uses lower of gpu, cores and mem requirements
4. evaluate_late() - Evaluates remaining expressions as late as possible
5. match() - Matches the combined entity requirements with a suitable destination
6. rank() - The matching destinations are ranked and the best match chosen
6. rank() - The matching destinations are ranked
7. choose - The ranked destinations are evaluated, with the first non-failing match chosen (no rule failures)


Expressions
===========

Most vortex properties can be expressed as python expressions. The rule of thumb is that all string expressions
Most TPV properties can be expressed as python expressions. The rule of thumb is that all string expressions
are evaluated as python f-strings, and all integers or boolean expressions are evaluated as python code blocks.
For example, cpu, cores and mem are evaluated as python code blocks, as they evaluate to integer/float values.
However, env and params are evaluated as f-strings, as they result in string values. This is to improve the readability
and syntactic simplicity of vortex config files.
and syntactic simplicity of TPV config files.

At the point of evaluating these functions, there is an evaluation context, which is a default set of variables
that are available to that expression. The following default variables are available to all expressions:
Expand All @@ -140,13 +141,19 @@ Default evaluation context
+----------+-----------------------------------------------------------------------------+
| job | the Galaxy job object |
+----------+-----------------------------------------------------------------------------+
| mapper | the vortex mapper object, which can be used to access parsed vortex configs |
| mapper | the TPV mapper object, which can be used to access parsed TPV configs |
+----------+-----------------------------------------------------------------------------+
| entity | the vortex entity being currently evaluated. Can be a combined entity. |
| entity | the TPV entity being currently evaluated. Can be a combined entity. |
+----------+-----------------------------------------------------------------------------+
| self | an alias for the current vortex entity. |
| self | an alias for the current TPV entity. |
+----------+-----------------------------------------------------------------------------+

Custom evaluation contexts
---------------------------
These are user defined context values that can be defined globally, or locally at the level of each
entity. Any defined context value is available as a regular variable at the time the entity is evaluated.


Special evaluation contexts
---------------------------
In addition to the defaults above, additional context variables are available at different steps.
Expand All @@ -159,13 +166,13 @@ expressions can be based on gpu values. mem expressions can refer to both cores
refer to evaluated env expressions.

*rank functions* - these can refer to all prior expressions, and are additional passed in a `candidate_destinations`
array, which is a list of matching vortex destinations.
array, which is a list of matching TPV destinations.


Scheduling
==========

Vortex offers several mechanisms for controlling scheduling, all of which are optional.
TPV offers several mechanisms for controlling scheduling, all of which are optional.
In its simplest form, no scheduling constraints would be defined at all, in which case
the entity would schedule on the first available entity. Admins can use additional

Expand Down Expand Up @@ -216,7 +223,19 @@ can execute that tool. Of course, the destination must also be marked as not rej

Scheduling by rules
-------------------

Rules can be used to conditionally modify any entity requirement. Rules can be given an ID,
which can subsequently be used by an inheriting entity to override the rule. If no ID is
specified, a unique ID is generated, and the rule can no longer be overridden. Rules
are typically evaluated through an `if` clause, which specifies the logical condition under
which the rule matches. If the rule matches, cores, memory, scheduling tags etc. can be
specified to override inherited values. The special clause `fail` can be used to immediately
fail the job with an error message. The `execute` clause can be used to execute an arbitrary
code block on rule match.

Scheduling by custom ranking functions
--------------------------------------
The default rank function sorts destinations by scoring how well the tags match the job's requirements.
As this may often be too simplistic, the rank function can be overridden by specifying a custom
rank clause. The rank clause can contain an arbitrary code block, which can do the desired sorting,
for example by determining destination load by querying the job manager, influx statistics etc.
The final statement in the rank clause must be the list of sorted destinations.
110 changes: 109 additions & 1 deletion docs/topics/tpv_by_example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Inheritance provides a mechanism for an entity to inherit properties from anothe
gpus: 1
The `global` section is used to define global vortex properties. The `default_inherits` property defines a "base class"
The `global` section is used to define global TPV properties. The `default_inherits` property defines a "base class"
for all tools to inherit from.

In this example, if the `bwa` tool is executed, it will match the `default` tool, as there are no other matches,
Expand Down Expand Up @@ -327,3 +327,111 @@ in this example, the candidate destinations are first sorted by the best matchin
default ranking function), and then sorted by CPU usage per destination, obtained from the influxdb query.

Note that the final statement in the rank function must be the list of sorted destinations.

Custom contexts
---------------
In addition to the automatically provided context variables (see :doc:`concepts`), TPV allows you to define arbitrary
custom variables, which are then available whenever an expression is evaluated. Contexts can be defined both globally
or at the level of each entity, with entity level context variables overriding global ones.

.. code-block:: yaml
:linenos:
global:
default_inherits: default
context:
ABSOLUTE_FILE_SIZE_LIMIT: 100
large_file_size: 10
_a_protected_var: "some value"
tools:
default:
context:
additional_spec: --my-custom-param
cores: 2
mem: 4
params:
nativeSpecification: "--nodes=1 --ntasks={cores} --ntasks-per-node={cores} --mem={mem*1024} {additional_spec}"
rules:
- if: input_size >= ABSOLUTE_FILE_SIZE_LIMIT
fail: Job input: {input_size} exceeds absolute limit of: {ABSOLUTE_FILE_SIZE_LIMIT}
- if: input_size > large_file_size
cores: 10
https://toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.1.0+galaxy7:
context:
large_file_size: 20
additional_spec: --overridden-param
mem: cores * 4
gpus: 1
In this example, three global context variables are defined, which are made available to all entities.
Variable names follow Python conventions, where all uppercase variables indicate constants that cannot be overridden.
Lower case indicates a public variable that can be overridden and changed, even across multiple TPV config files.
An underscore indicates a protected variable that can be overridden within the same file, but not across files.

Additional, the tool defaults section defines an additional context variable named 'additional_spec`, which is only
available to inheriting tools.

If we were to dispatch a job, say bwa, with an input_size of 15, the large file rule in the defaults section would
kick in, and the number of cores would be set to 10. If we were to dispatch a hisat2 job with the same input size
however, the large_file_size rule would not kick in, as it has been overridden to 20. The main takeaway from this
example is that variables are bound late, and therefore, rules and params can be crafted to allow inheriting
tools to conveniently override values, even across files. While this capability can be powerful, it needs to be
treated with the same care as any global variable in a programming language.

Multiple matches
---------------
If multiple regular expressions match, the matches are applied in order of appearance. Therefore, the convention is
to specify more general rule matches first, and more specific matches later. This matching also applies across
multiple TPV config files, again based on order of appearance.

.. code-block:: yaml
:linenos:
tools:
default:
cores: 2
mem: 4
params:
nativeSpecification: "--nodes=1 --ntasks={cores} --ntasks-per-node={cores} --mem={mem*1024}"
https://toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/*:
mem: cores * 4
gpus: 1
https://toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.1.0+galaxy7:
env:
MY_ADDITIONAL_FLAG: "test"
In this example, dispatching a hisat2 job would result in a mem value of 8, with 1 gpu. However, dispatching
the specific version of `2.1.0+galaxy7` would result in the additional env variable, with mem remaining at 8.

Job Resubmission
----------------
TPV has explict support for job resubmissions, so that advanced control over job resubmission is possible.

.. code-block:: yaml
:linenos:
tools:
default:
cores: 2
mem: 4 * int(job.destination_params.get('SCALING_FACTOR', 1)) if job.destination_params else 1
params:
SCALING_FACTOR: "{2 * int(job.destination_params.get('SCALING_FACTOR', 2)) if job.destination_params else 2}"
resubmit:
with_more_mem_on_failure:
condition: memory_limit_reached and attempt <= 3
destination: tpv_dispatcher
In this example, we have defined a resubmission handler that resubmits the job if the memory limited is reached.
Note that the resubmit section looks exactly the same as Galaxy's, except that it follows a dictionary structure
instead of being a list. Refer to the Galaxy job configuration docs for more information on resubmit handlers. One
twist in this example is that we automatically increase the amount of memory provided to the job on each resubmission.
This is done by setting the SCALING_FACTOR param, which is a custom parameter which we have chosen for this example,
that we increase on each resubmission. Since each resubmission's destination is TPV, the param is re-evaluated on each
resubmission, and scaled accordingly. The memory is allocated based on the scaling factor, which therefore, also
scales accordingly.
2 changes: 1 addition & 1 deletion tpv/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Total Perspective Vortex library setup."""

# Current version of the library
__version__ = "1.1.0"
__version__ = "1.2.0"


def get_version():
Expand Down

0 comments on commit 2b86066

Please sign in to comment.