Parent Child Communication Strategies for Nested Flux Instances

When creating hierarchies of nested Flux instances, it can be advantageous for the nested instances (children) to communicate with their enclosing instance (parent). This page documents the general strategies enabled by Flux for communicating between parents and children. This page also includes examples of parent-child communication in both isolated code-snippets and a full, real-world example.

What is generally possible

In general, parent-child communication can occur in two ways: parent-initiated (push) and child-initiated (pull). Specifically, Flux has full support for pull-communication, and limited support for push-communication. Flux supports pull-communication for all 4 types of messages listed in RFC 3: events, requests, responses, and keepalives. Children can send requests to and receive response from parent-provided services and subscribe to events and keepalives on parent overlays. Flux supports limited push-communication via the guest KVS namespace outlined in RFC 16

There are two key points to keep in mind while communicating between children and parents. First, parent instances currently cannot send requests to or receive responses from child-registered services and cannot subscribe to events or keepalives on child overlays (i.e., push-communication). Ongoing work on dynamic service registration will enable children to register services on their parent's overlay (more information). Second, child instances must be co-located with parent instances (i.e., run on the same node) in order to connect into the parent instance's overlay using the local connector. To connect to remote parent instances (i.e., running on different nodes), child instances can use the ssh connector. Ongoing work on a tcp connector will enable instances to connect to other instances on remote nodes without using ssh (more information).

This figure depicts the communication overlays of three nested Flux instances. Overlapping circles represent Flux broker ranks that are co-located (i.e., running on the same nodes). In this example, all three ranks of nested instance #1 (orange) and #2 (green) can connect to the root instance (blue) overlay via the local connector. If the any of the ranks in nested instance #1 need to communicate with nested instance #2, they must either use the ssh connector or forward their messages through the parent instance.

Examples

We now present several examples of parent-child communication in Flux, including both isolated examples and a full, real-world example.

Isolated Examples

Connecting to parent instance

We must first create a handle to the parent overlay using the uri stored in our local instance's parent-uri attr.

flux_t *parent_h = flux_open(flux_attr_get(h, "parent-uri", NULL), 0);

In order to receive RPC responses and use continuations on the parent handle, we must set the handle's reactor to the same one used in our program. (see flux-core issue #1156 for more information)

flux_set_reactor(parent_h, flux_get_reactor(h));

Sending an RPC to a parent instance

The only difference between sending an RPC within an instance and sending an RPC to a parent instance is the handle used. In this example, the parent's Flux handle is used, which sends the RPC over the parent's overlay. Note: the parent can respond to a request from a child in the same way it would reply to any other request (i.e., the child's handle is not required).

flux_future_t *f = flux_rpc (parent_h, "<service>.<method>", NULL, FLUX_NODEID_ANY, 0);

Push-communication

See rfc #16 for more information on how to accomplish limited push-communication using the guest KVS namespace.

Full Example

In this real-world example, we outline an offload module and initial_program.py that together can distribute jobs through a hierarchy of nested Flux instances using only pull-communications. The design of these codes consists of four major components: hierarchy configuration, hierarchy launch, user job submission, and job distribution. Links to the full code:

Hierarchy configuration

Before we can create a hierarchy and distribute jobs across it, we must first describe the parameters of the hierarchy. In this section, we present an example hierarchy configuration, written in JSON, that makes two specific assumptions. First, it assumes that the hierarchy is defined entirely a-priori. Second, it assumes that the number of children is constant across all instances at the same level in the hierarchy (i.e., the branching factor of the hierarchy can only vary across levels, not within the same level). It is important to note that these are not limitations of Flux, but purely simplifications made in this example. Flux supports both dynamic hierarchies as well as arbitrary numbers of children at any instance.

At the top-level of the JSON, there are two keys: "num_levels" and "levels". "num_levels" defines the depth of the hierarchy (including the root instance) and "levels" defines the number of children under each instance (according to the instance's depth) and the resources given to those children.

{
    "levels": [
        {"cores_per_child": 36, "num_children": 8},
        {"cores_per_child": 1, "num_children": 36}
    ],
    "num_levels": 3
}

In this example configuration JSON, there are 288 cores available and three levels in the hierarchy. The first level consists of (an implicit) single root instance managing all 288 cores. The second level consists of 8 instances, each managing 36 cores. The third, and final, level consists of 288 (8 * 36) instances, each managing 1 core.

Hierarchy Launch

The hierarchy launching is handled entirely by the initial_program.py. The initial_program.py starts by reading a hierarchy configuration (described in the previous section) from the filesystem.

hierarchy config parsing code

Based on the hierarchy configuration, the initial_program.py submits child instances for launch to the local sched module (via a job.submit request on the local Flux handle).

Each of the children is launched with intial_program.py as their initial program, allowing the hierarchy creation to continue recursively.

User Job Submission

Users submit all of their jobs to the offload module running in the root Flux instance.The offload module queues the jobs until it receives requests for work from its children (new_job callback).

Job Distribution via Pull-Communication

In order to distribute jobs via pull-communication, each parent must register a service that child instances can send requests to. To this end, the offload module registers a service called offload.need_job that is used by its children to request jobs. Before a child can request jobs from its parent, it must first connect into the parent instance's overlay. Once a child has launched, the child's offload module connects to the parent instance's overlay (creating a Flux handle).

link to code snippet, in context

ctx->parent_h = flux_open(flux_attr_get(ctx->h, "parent-uri", NULL), 0);
flux_set_reactor(ctx->parent_h, flux_get_reactor(ctx->h));

To request jobs from it parent, a child sends a request to "offload.need_job" on its parent's overlay (using the newly created Flux handle).

link to code snippet, in context

flux_future_t *f = flux_rpc (ctx->parent_h, "offload.need_job", NULL, FLUX_NODEID_ANY, 0);
if (flux_future_then (f, child_recv_work_cb, ctx) < 0) {
    flux_log (ctx->h, LOG_ERR, "%s: flux_future_then", __FUNCTION__);
    return -1;
}

Upon receiving a request from its child for work, the parent responds to the message with a jobspec. Note how the parent is agnostic to the fact that the message came to the child; it responds as it would to any request.

link to code snippet, in context

if (flux_respond (h, msg, 0, job_spec_str) < 0) {
    flux_log (h, LOG_ERR, "%s: flux_respond", __FUNCTION__);
}

Upon receiving a response from its parent, the child offload module's next action is dictated by the instance's position within the nested Flux hierarchy.

If the instance has children that it can forward the job to, the module queues the job. Note that the message payload is duplicated since the message will be destroyed at the return of the callback.