Each node starts a TCP server that listens on the port specified in new dt({port: 9999});
.
Any number of nodes can connect to a server.
A node connects to another node to exchange data using the primary client.
When the primary client is disconnected the connection routine starts:
- find the node with the lowest connection failures
- find the node that has the lowest latency and less or equal connection failures than the node found in step 1
- connect to node found in step 2 with the primary client and store the
client_id
from the first message received to be sent with all future messages - sends
open
with it's node_id to be tested asis_self
, validates response or disconnects - sends
connected_nodes
to be stored infragment_list
- begins sending
ping
messages, acceptingpong
messages and maintaining the 20 latest round trip times - forwards, sends and receives normal messages per the message type logic
- expires
fragment_list
nodes older than 24 hours - clears expired message ids
- initiates tests on distant nodes
- initiates test on nodes
- handles reconnects based on latency and
fragment_list
data **
distant_node
messages are exchanged between connected nodes. Distant nodes are maintained in a list that is different than the list of nodes that a node has. Nodes are moved from the distant node list to the node list after a successful test_node()
finishes with them as the subject.
new client node connected
- sends a distant node message to every connected client
- sends a distant node message to the server the node is connected to with it's primary client
- sends all the known nodes to the new client
distant_node sent to server if the distant node does not exist
valid_server_message()
- sends a distant_node message of itself to the distant node
- forwards the distant_node message to each connected client node
distant_node sent to primary client
valid_primary_client_message()
- only adds the distant node if it is does not already exist (this is what keeps it from going through the whole network, distant nodes are not forwarded if received by the primary client, only if received by the server and do not exist).
- removes any duplicate ip:port pairs that may have been re-added with a different node_id after restart
- connects to server of remote node
- sends it's node_id with all messages to be tested as
is_self
- sends
connected_nodes
to be stored infragment_list
- sends 20
test_ping
messages that each require atest_pong
response before the next in sequence and stores the data in an ordered set that prunes any data older than the 20th
- each node compares it's
fragment_list
against non connected nodes and distant nodes - if a non connected node is not in
fragment_list
, send the non connected node{type: 'defragment', fragment_list_length: N}
- a 50/50 race condition is prevented because
fragment_list_length
and the count offragment_list
is compared on the node receivingfragment_list_length
. Iffragment_list_length
is equal or larger than the count offragment_list
then the receiving node reconnects, otherwise the receiving node sends{type: 'defragment_greater_count'}
and the sending node reconnects - the node that would require the least work to reconnect (in Italics) sends a
distant_node
message of the non connected node via it's primary client, disconnects it's primary client then connects to the non connected node - This forces no disconnects that result in a reconnect with fragmentation through all branches of the fragmented segment. Large networks are possible without the need to know every node.
type: is_self
messages allow NAT to work in IPv4 and IPv6 while preventing nodes from having duplicate entries
node_ids are generated randomly by a node when a node starts, there is no requirement to maintain a pre deployment list of node_ids
// advanced/non configurable options
this.max_test_failures = 5;
this.max_ping_count = 20;
this.clean_interval = 5000;
// if there is a better node to use as the primary, wait this long before disconnecting the existing primary client
this.better_primary_wait = 1000 * 60 * 20;
// a node with a latency lower than this * the primary node latency avg is classified as better
this.better_primary_latency_multiplier = .7;
// wait this long before purging nodes that are
// 1. unreachable
// 2. not updated in the fragment list
this.purge_node_wait = 1000 * 60 * 60;
// retest after a successful test at this interval
this.retest_wait_period = 1000 * 60 * 10;
// do not allow messages with a duplicate message_id more than this often
this.message_duplicate_expire = 1000 * 60 * 5;
// only defrag this often
this.defrag_wait_period = 1000 * 60 * 10;
// debug settings, each shows itself and all those below
// 0 no debugging output
// 1 show nodes
// 2 show messages and what node they are from
this.debug = 0;