-
Notifications
You must be signed in to change notification settings - Fork 0
Integration
E2SAR uses a combination of UDP packets and gRPC+TLS to communicate with loadbalancer control plane and dataplane.
All interactions are captured in the figure above.
The process of using the load balancer begins with (1) reserving a load balancer using an out-of-band information like the admin token
and the ejfat URL of the Control Plane agent. This can be done by one of the receivers or by Workflow Management System (WMS) or its proxy.
Once the load balancer is reserved and the details of the reservation communicated out of band (shown with dotted lines) to the worker nodes, they can register themselves with the control plane (2). Alternatively the registration can be peformed by WMS or its proxy assuming the details of the workers are known to it.
At this point (after all workers have been registered and the details of the load balancer communicated to the sender(s) - shown with dotted lines) data can begin traversing the dataplane in segments (3a and 3b).
Periodic sync messages (4) are required to be sent by the sender once a second. Similarly the receiving nodes or WMS on their behalf must periodically use SendState gRPC call (5) to update queue occupancy information in the control plane.
After the workflow completes, the workers can be deregistered (6) - themselves or via WMS. The load balancer can be freed (7).
Different control plane commands use different portions of the EJFAT URI and this section attempts to explain which portions of the URI is relevant at different stages as it relates to the diagram below.
Step 1 (ReserveLoadBalancer): ejfat[s]://<admin token>@<cp name or IPv4 or IPv6 address>:<cp port>/
Step 2 (Register worker node): ejfat[s]://<instance or admin token>@<cp name or IPv4 or IPv6 address>:<cp port>/lb/<lbid>
- instance token and lbid come from Step 1
Steps 3a, 3b and 4 (Sender sending data and sync packets): ejfat[s]://<instsance or admin token>@<cp name or IPv4 or IPv6 address>:<cp port>[/lb/<lbid>]?sync=<sync IP address>:<sync UDP port>&data=<data IPv4>[&data=<data IPv6>]
- Note that instance token and CP name and address and lb id aren't actually used, but it is convenient to pass all the information as a single EJFAT URI
- Note that up to two dataplane addresses of the loadbalancer (where to send the data) can be specified - an IPv4 and an IPv6 address
Step 5 (SendState from worker node): ejfat[s]://<session or admin token>@<cp name or IPv4 or IPv6 address>:<cp port>/lb/<lbid>?sessionid=<session id>&[sync=<sync IP address>:<sync UDP port>&data=<data IPv4>[&data=<data IPv6>]]
- session id and session token come from the return of Step 2
- sync and data addresses are not relevant
Step 6 (Deregister worker): ejfat[s]://<session or admin token>@<cp name or IPv4 or IPv6 address>:<cp port>/lb/<lbid>?sessionid=<session id>&[sync=<sync IP address>:<sync UDP port>&data=<data IPv4>[&data=<data IPv6>]]
- session id and session token come from Step 2
- sync and data addresses are not relevant
Steps X and Y (GetLoadBalancer and LoadBalancerStatus): ejfat[s]://<instance or admin token>@<cp name or IPv4 or IPv6 address>:<cp port>/lb/<lbid>
- instance token and lbid come from Step 1
Note that EjfatURI
object in the code can maintain multiple tokens internally and it knows which token to use in which sitation when using gRPC. Classes Segmenter and Reassembler also know how to ask EjfatURI
object for the information they need. When loading a new EjfatURI object from a URI string (or environment variable) the constructor has a flag telling the object how to interpret the passed in token (admin, instance or session).
EJFAT Control plane has a few timing rules and best practices for implementation:
- Sync messages should be sent about 1 time per second by the sender (can be more often). The control plane uses a ringbuffer to calculate the rate slope. E2SAR also includes a low-pass filter for this purpose (you can set the sync period and the number of sync periods over which the reported rate is computed).
- After a worker is registered,
sendState
must be sent within 10 seconds. In general a period of 100ms is recommended for doingsendState
- When reserving a load balancer with
reserveLB
if the passed in time is 0 (i.e. UNIX epoc) the reservation never expires.
Additional details are contained in this document