Skip to content

Latest commit

 

History

History
147 lines (111 loc) · 6.9 KB

README.md

File metadata and controls

147 lines (111 loc) · 6.9 KB

Open in GitHub Codespaces Linux/Mac/Windows build status

Distributed Computing

In distributed computing systems, a group of computers work together to achieve a common goal. For example, a group of computers might work together to analyze a large data set. In these types of computing systems, each computer manages a piece of the problem and interacts with the other computing systems by passing messages. Each computer handles a subset of the required operations, and some operations might require inputs computed by a different computer. By passing a message containing the required input, the operation can then be completed. These messages might contain information required to continue the computations, and so can become a bottleneck to efficient computation. By minimizing the number of messages required between computers we can also minimize the number of dependencies between the operations performed on different systems, making the overall computation faster and more efficient by minimizing waiting time.

Modeling the Problem as a Graph

To solve the problem of minimizing messaging between computers in a distributed computing system, we build a graph or network model. Each operation for the overall computation is represented by a node, or vertex, in the graph, and an edge between nodes indicates that there is a dependency between two operations. To minimize the number of messages passed, we would like to partition the operations amongst the available computers so that the number of messages between computers (or partitions) is minimized. Additionally, we would also like to balance the workload across our available computers by partitioning the operations evenly.

To solve this problem in our graph model, we are looking to partition the set of nodes into a fixed number of subsets of equal size so that the total number of edges between subsets is minimized. This is known as the graph k-partitioning problem. In the case where k = 2, it is straightforward to use binary variables to indicate the subsets for each operation and solve using a binary quadratic model, as shown in the graph partitioning code example. For k > 2, the problem becomes significantly more complex.

Usage

To run the demo, type:

python demo.py

Additional options are available to select different graphs to run the problem on. To see the full list of options, type:

python demo.py -h

During a successful run of the program, two images are produced and saved. The first is the original input graph, saved as input_graph.png.

Example Input

The second highlights the partition of the population into groups.

Example Output

Graphs Available

Several different types of graphs or networks are available for this demo using the options provided. These are all built using NetworkX graph generator functions, and the details of these functions can be found here.

  • partition: Partition graph; specify number of nodes, number of partitions, and inter- and intra-partition edge probabilities.
  • internet: Internet Autonomous System network; specify number of nodes and partitions.
  • rand-reg: A random d-regular graph; specify number of nodes and value for d.
  • ER: Erdos-Renyi random graph; specify number of nodes and edge probability.
  • SF: Barabasi-Albert scale-free graph; specify number of nodes and number of edges to add from a new node to any existing nodes.

The default graph is the partition graph on 100 nodes with 4 partitions with inter-partition edge probability of 0.5 and intra-partition edge probability of 0.001. The largest number of nodes times the number of partitions allowed for any problem instance can be at most 5,000.

Code Overview

The demo program formulates this graph k-partitioning problem as a constrained quadratic model (CQM), and solves it using the hybrid CQM solver.

Variables

The formulation of this problem defines a binary variable x for each pair (n, p), where n is a node in the graph and p is a partition. If the solution returns variable (n, p) = 1, then node n is assigned to partition p. Otherwise, if the solution returns variable (n, p) = 0, then node n is not assigned to partition p.

Objective

The objective for this problem is to minimize the number of inter-partition edges. We can formulate this as a binary quadratic expression that needs to be minimized by considering an arbitrary edge (i, j) between nodes i and j in the graph. For each partition p, we add the expression (i, p) + (j, p) - 2*(i, p)*(j, p) to decrease the overall cost when i and j are assigned to the same partition, and increase the overall cost when they are not. To see how this expression maps to these costs, we examine the following table which demonstrates the cost of edge (i, j), depending on whether i and j are each assigned to partition p.

(i, k) (j, k) edge (i,j) cost
0 0 intra 0
0 1 inter 1
1 0 inter 1
1 1 intra 0

Now that we have an expression for the appropriate cost for each edge and each partition, we simply sum over all edges and all partitions to build the objective function that will minimize the number of inter-partition edges in the entire graph.

Objective: minimize Σ(i,j) Σp x(i,p) + x(j,p) - 2*x(i,p)*x(j,p)

Constraints

One-Hot Constraint

Each node in our graph must be assigned to exactly one partition, so we must enforce a one-hot constraint on each node. That is, for each node i, we must have that the sum of all binary variables associated with i is equal to 1.

Constraint 1: Σk x(i,p) = 1, for each node i

Partition Size Constraint

To efficiently distribute the operational load across computers in our system, we would like the partitions to have equal size. If N is the total number of nodes in the graph and k is the number of partitions available, each partition should have size N/k. We enforce this by requiring that the sum of binary variables associated with each partition is equal to N/k. Note that this requires that N is evenly divisble by k, and so the demo file will adjust N as needed to enforce this requirement.

Constraint 2: Σi x(i,p) = N/k, for each partition p.

License

Released under the Apache License 2.0. See LICENSE file.