Skip to content

MSKCC meeting July 19 2017

Vivekanandan (Vivek) Balasubramanian edited this page Nov 27, 2018 · 2 revisions

MSKCC meeting -- July 19 2017

Note: These notes are only taken in the context of EnTK.

Meeting notes

  • Todos: Patrick to provide us with a working example script along with smaller scale data and instructions

The MSKCC usecase discussed is one iterating Pipeline, P, where the second stage is an Ensemble of iterating Pipelines, p:

P = [Preprocessing, (p,p,...) , MBAR Analysis]

  • Number of iterations of P = N (user parameter)

First stage: Preprocessing

This stage consists of a single task that processes data provided by the user (in the first iteration) or from the previous iteration to partition total resources across a set of (compound, phase and lambda state) values that need to be studied.

Second stage: Ensemble of iterating pipelines

The iterating pipelines in the second stage, p, consist of 2 stages:

p = [MD simulation, Compute weights]

Details about the number of tasks, stages and pipelines in this stage:

  • Number of tasks in MD simulation stage = 1 (O(10) seconds on GPU)
  • Number of tasks in Compute weights stage = Number of non-equilibrium protocol operations (user parameter) (O(10) seconds on GPU, ~3-4 x MD simulation time)
  • Number of pipelines p in the second stage of P = Number of compounds being studied x Number of phases per compound x Number of lambda states per phase
    • Number of compounds = O(100) (user parameter)
    • Number of phases per compound = 2
    • Number of lambda states per phase = 2
  • Number of iterations of each pipeline p = n (user parameter, O(100))

Now for the interesting part!

The pipeline p is a type of directed graph with a cycle where we know the starting node is. The graph is as follows (looking like a cyclic graph):

    ------      -------------------
--->| MD | ---> | Compute Weights |
|   ------      -------------------
|_____|
(n iterations)

Since we know the starting node as well as the number of iterations n. We can form a DAG out of it by unrolling:

    ------      -------------------
    | MD | ---> | Compute Weights |
    ------      -------------------
      |
      |         ------      -------------------
       -------> | MD | ---> | Compute Weights |
                ------      -------------------
                  |
                  |     ------      -------------------
                   ---> | MD | ---> | Compute Weights |
                        ------      -------------------
                          |
                          |
                          (up to n times)

With a few extensions to the current EnTK this usecase can be supported (more to discuss in-person). With the EnTK approach, in the second stage, we will have a growing number of pipelines (but constant workload!). Let's say we start off with 100 compounds.

  • Number of pipelines at the beginning = 100 x 2 x 2 = 400
  • Number of pipelines that would have run at the end = n x 100 x 2 x 2 = 400n

but number of stages executing at any time (per pipeline p) is always 2 (in the above diagram time is increasing along the -y direction)!

Third stage: MBAR Analysis

Each of the pipelines p from the second stage provide n weights to the MBAR Analysis. With the 400n weights, the MBAR studies the global progress and generates data for the Preprocessing stage of the next iteration.

In the simple case, MBAR is run over all pipelines of the second stage. In the future, we might decide to run MBAR on a subset of the pipelines of the second stage (e.g.: MBAR over the first 50 pipelines, remaining pipelines are canceled).

Full picture of the workflow

                                 ------      -------------------
                        -------->| MD | ---> | Compute Weights |
                        |        ------      -------------------
                        |          |                                            ---------
                        |          |         ------      -------------------             |
                        |           -------> | MD | ---> | Compute Weights |             |
                        |                    ------      -------------------             |
                        |                      |                                         |
                        |                      (n times)                                 |
                        |                                                                |
                        |                                                                |
    ----------------    |        ------      -------------------                         |
--->|     Data     |  ---------> | MD | ---> | Compute Weights |                         |        ------------
|   | Preprocessing|    |        ------      -------------------                         |        |   MBAR   | ---
|   ----------------    |          |                                            ---------|------->| Analysis |    |
|                       |          |         ------      -------------------             |        ------------    |
|                       |           -------> | MD | ---> | Compute Weights |             |                        |
|                       |                    ------      -------------------             |                        |
|                       |                      |                                         |                        |
|                       |          .           (n times)                                 |                        |
|                       |          .                                                     |                        |
|                       |          .                                                     |                        |
|                       |          .                                                     |                        |
|                       |          .                                                     |                        |
|                       |        ------      -------------------                         |                        |
|                       -------->| MD | ---> | Compute Weights |                         |                        |
|                                ------      -------------------                         |                        |
|                                  |                                            ---------                         |
|                                  |         ------      -------------------                                      |
|                                   -------> | MD | ---> | Compute Weights |                                      |
|                                            ------      -------------------                                      |
|                                              |                                                                  |
|                                              (n times)                                                          |
-------------------------------------------------------------------------------------------------------------------