-
Notifications
You must be signed in to change notification settings - Fork 17
MSKCC meeting July 19 2017
Note: These notes are only taken in the context of EnTK.
- Todos: Patrick to provide us with a working example script along with smaller scale data and instructions
The MSKCC usecase discussed is one iterating Pipeline, P, where the second stage is an Ensemble of iterating Pipelines, p:
P = [Preprocessing, (p,p,...) , MBAR Analysis]
- Number of iterations of P = N (user parameter)
This stage consists of a single task that processes data provided by the user (in the first iteration) or from the previous iteration to partition total resources across a set of (compound, phase and lambda state) values that need to be studied.
The iterating pipelines in the second stage, p, consist of 2 stages:
p = [MD simulation, Compute weights]
Details about the number of tasks, stages and pipelines in this stage:
- Number of tasks in MD simulation stage = 1 (O(10) seconds on GPU)
- Number of tasks in Compute weights stage = Number of non-equilibrium protocol operations (user parameter) (O(10) seconds on GPU, ~3-4 x MD simulation time)
- Number of pipelines p in the second stage of P = Number of compounds being studied x Number of phases per compound x Number of lambda states per phase
- Number of compounds = O(100) (user parameter)
- Number of phases per compound = 2
- Number of lambda states per phase = 2
- Number of iterations of each pipeline p = n (user parameter, O(100))
Now for the interesting part!
The pipeline p is a type of directed graph with a cycle where we know the starting node is. The graph is as follows (looking like a cyclic graph):
------ -------------------
--->| MD | ---> | Compute Weights |
| ------ -------------------
|_____|
(n iterations)
Since we know the starting node as well as the number of iterations n. We can form a DAG out of it by unrolling:
------ -------------------
| MD | ---> | Compute Weights |
------ -------------------
|
| ------ -------------------
-------> | MD | ---> | Compute Weights |
------ -------------------
|
| ------ -------------------
---> | MD | ---> | Compute Weights |
------ -------------------
|
|
(up to n times)
With a few extensions to the current EnTK this usecase can be supported (more to discuss in-person). With the EnTK approach, in the second stage, we will have a growing number of pipelines (but constant workload!). Let's say we start off with 100 compounds.
- Number of pipelines at the beginning = 100 x 2 x 2 = 400
- Number of pipelines that would have run at the end = n x 100 x 2 x 2 = 400n
but number of stages executing at any time (per pipeline p) is always 2 (in the above diagram time is increasing along the -y direction)!
Each of the pipelines p from the second stage provide n weights to the MBAR Analysis. With the 400n weights, the MBAR studies the global progress and generates data for the Preprocessing stage of the next iteration.
In the simple case, MBAR is run over all pipelines of the second stage. In the future, we might decide to run MBAR on a subset of the pipelines of the second stage (e.g.: MBAR over the first 50 pipelines, remaining pipelines are canceled).
------ -------------------
-------->| MD | ---> | Compute Weights |
| ------ -------------------
| | ---------
| | ------ ------------------- |
| -------> | MD | ---> | Compute Weights | |
| ------ ------------------- |
| | |
| (n times) |
| |
| |
---------------- | ------ ------------------- |
--->| Data | ---------> | MD | ---> | Compute Weights | | ------------
| | Preprocessing| | ------ ------------------- | | MBAR | ---
| ---------------- | | ---------|------->| Analysis | |
| | | ------ ------------------- | ------------ |
| | -------> | MD | ---> | Compute Weights | | |
| | ------ ------------------- | |
| | | | |
| | . (n times) | |
| | . | |
| | . | |
| | . | |
| | . | |
| | ------ ------------------- | |
| -------->| MD | ---> | Compute Weights | | |
| ------ ------------------- | |
| | --------- |
| | ------ ------------------- |
| -------> | MD | ---> | Compute Weights | |
| ------ ------------------- |
| | |
| (n times) |
-------------------------------------------------------------------------------------------------------------------