-
Notifications
You must be signed in to change notification settings - Fork 13
Parent Child Communication Strategies for Nested Flux Instances
When creating hierarchies of nested Flux instances, it can be advantageous for the nested instances (children) to communicate with their enclosing instance (parent). This page documents the general strategies enabled by Flux for communicating between parents and children. This page also includes examples of parent-child communication in both isolated code-snippets and a full, real-world example.
In general, parent-child communication can occur in two ways: parent-initiated (push) and child-initiated (pull). Specifically, Flux has full support for pull-communication, and limited support for push-communication. Flux supports pull-communication for all 4 types of messages listed in RFC 3: events, requests, responses, and keepalives. Children can send requests to and receive response from parent-provided services and subscribe to events and keepalives on parent overlays. Flux supports limited push-communication via the guest KVS namespace outlined in RFC 16
There are two key points to keep in mind while communicating between children and parents. First, parent instances currently cannot send requests to or receive responses from child-registered services and cannot subscribe to events or keepalives on child overlays (i.e., push-communication). Ongoing work on dynamic service registration will enable children to register services on their parent's overlay (more information). Second, child instances must be co-located with parent instances (i.e., run on the same node) in order to connect into the parent instance's overlay using the local
connector. To connect to remote parent instances (i.e., running on different nodes), child instances can use the ssh
connector. Ongoing work on a tcp
connector will enable instances to connect to other instances on remote nodes without using ssh (more information).
This figure depicts the communication overlays of three nested Flux instances. Overlapping circles represent Flux broker ranks that are co-located (i.e., running on the same nodes). In this example, all three ranks of nested instance #1 (orange) and #2 (green) can connect to the root instance (blue) overlay via the local
connector. If the any of the ranks in nested instance #1 need to communicate with nested instance #2, they must either use the ssh
connector or forward their messages through the parent instance.
We now present several examples of parent-child communication in Flux, including both isolated examples and a full, real-world example.
We must first create a handle to the parent overlay using the
uri stored in our local instance's parent-uri
attr.
flux_t *parent_h = flux_open(flux_attr_get(h, "parent-uri", NULL), 0);
In order to receive RPC responses and use continuations on the parent handle, we must set the handle's reactor to the same one used in our program. (see flux-core issue #1156 for more information)
flux_set_reactor(parent_h, flux_get_reactor(h));
The only difference between sending an RPC within an instance and sending an RPC to a parent instance is the handle used. In this example, the parent's Flux handle is used, which sends the RPC over the parent's overlay. Note: the parent can respond to a request from a child in the same way it would reply to any other request (i.e., the child's handle is not required).
flux_future_t *f = flux_rpc (parent_h, "<service>.<method>", NULL, FLUX_NODEID_ANY, 0);
See rfc #16 for more information on how to accomplish limited push-communication using the guest KVS namespace.
In this real-world example, we outline an offload
module and initial_program.py
that together can distribute jobs through a hierarchy of nested Flux instances using only pull-communications.
The design of these codes consists of four major components: hierarchy configuration, hierarchy launch, user job submission, and job distribution.
Links to the full code:
Before we can create a hierarchy and distribute jobs across it, we must first describe the parameters of the hierarchy. In this section, we present an example hierarchy configuration, written in JSON, that makes two specific assumptions. First, it assumes that the hierarchy is defined entirely a-priori. Second, it assumes that the number of children is constant across all instances at the same level in the hierarchy (i.e., the branching factor of the hierarchy can only vary across levels, not within the same level). It is important to note that these are not limitations of Flux, but purely simplifications made in this example. Flux supports both dynamic hierarchies as well as arbitrary numbers of children at any instance.
At the top-level of the JSON, there are two keys: "num_levels" and "levels". "num_levels" defines the depth of the hierarchy (including the root instance) and "levels" defines the number of children under each instance (according to the instance's depth) and the resources given to those children.
{
"levels": [
{"cores_per_child": 36, "num_children": 8},
{"cores_per_child": 1, "num_children": 36}
],
"num_levels": 3
}
In this example configuration JSON, there are 288 cores available and three levels in the hierarchy. The first level consists of (an implicit) single root instance managing all 288 cores. The second level consists of 8 instances, each managing 36 cores. The third, and final, level consists of 288 (8 * 36) instances, each managing 1 core.
The hierarchy launching is handled entirely by the initial_program.py
.
The initial_program.py
starts by reading a hierarchy configuration (described in the previous section) from the filesystem.
Based on the hierarchy configuration, the initial_program.py
submits
child instances for launch to the local sched
module (via a
job.submit
request on the local Flux handle).
Each of the children is launched with intial_program.py
as their initial program, allowing the hierarchy creation to continue recursively.
Users submit all of their jobs to the offload
module running in the
root Flux instance.The offload
module queues the jobs until it
receives requests for work from its children (new_job
callback).
In order to distribute jobs via pull-communication, each parent must register a service that child instances can send requests to. To this end, the offload
module registers a service called offload.need_job
that is used by its children to request jobs. Before a child can request jobs from its parent, it must first connect into the parent instance's overlay. Once a child has launched, the child's offload
module connects to the parent instance's overlay (creating a Flux handle).
ctx->parent_h = flux_open(flux_attr_get(ctx->h, "parent-uri", NULL), 0);
flux_set_reactor(ctx->parent_h, flux_get_reactor(ctx->h));
To request jobs from it parent, a child sends a request to "offload.need_job" on its parent's overlay (using the newly created Flux handle).
flux_future_t *f = flux_rpc (ctx->parent_h, "offload.need_job", NULL, FLUX_NODEID_ANY, 0);
if (flux_future_then (f, child_recv_work_cb, ctx) < 0) {
flux_log (ctx->h, LOG_ERR, "%s: flux_future_then", __FUNCTION__);
return -1;
}
Upon receiving a request from its child for work, the parent responds to the message with a jobspec. Note how the parent is agnostic to the fact that the message came to the child; it responds as it would to any request.
-
link to code snippet, in context
if (flux_respond (h, msg, 0, job_spec_str) < 0) { flux_log (h, LOG_ERR, "%s: flux_respond", __FUNCTION__); }
Upon receiving a response from its parent, the
child offload
module's next action is dictated by the
instance's position within the nested Flux hierarchy.
- If the instance has children that it can forward the job to,
the module queues the job. Note that the message payload
is duplicated since the message will be destroyed at the
return of the callback.
-
link to code snippet, in context
flux_log (ctx->h, LOG_DEBUG, "%s: internal node received new work, enqueueing ", __FUNCTION__); zlist_append (ctx->job_queue, strdup (job_spec_str)); flux_log (ctx->h, LOG_DEBUG, "%s: %zu jobs are now in the job_queue", __FUNCTION__, zlist_size (ctx->job_queue));
-
link to code snippet, in context
- If the instance is a leaf in the nested instance hierarchy
and thus has no children, the
offload
module submits the job to the localsched
module for execution.-
link to code snippet, in context
flux_log (ctx->h, LOG_DEBUG, "%s: leaf node received new work, submitting ", __FUNCTION__); submit_job (ctx->h, job_spec_str);
-
link to code snippet, in context
At this point, the job, which was originally submitted to the root instance's offload
module, has now traversed its way down the hierarchy to a leaf instance, where it is ultimately executed.