Skip to content

Latest commit

 

History

History
executable file
·
608 lines (458 loc) · 18.6 KB

README.md

File metadata and controls

executable file
·
608 lines (458 loc) · 18.6 KB

Graph Theory - Neo4j

Document History

Version Date Update
1.0 2018-01-13 Creation.
1.1 2018-01-16 Connected and regular graphs, adjacency matrix.

Table of content

Introduction

This workshop was inspired by the book Introduction to Graph Theory by Richard J. Trudeau.

I strongly recommend reading it to anyone who is interested in graph theory, but doesn't know where to start from.

Cover reproduced with permission from Dover publications.

The challenge is to implement graph theory concepts using pure Neo4j Cypher query language, without the help of any libraries such as Awesome Procedures On Cypher (APOC).

Circular Graphs

A cycle graph or circular graph is a graph that consists of a single cycle, or in other words, some number of vertices connected in a closed chain.

Taken from Wikipedia.

A circular graph with n vertices is annotated Cn.

WITH 5 AS Count
WITH range(1, Count) AS items
WITH [x in tail(items) | [x - 1, x]] + [[last(items), head(items)]] AS tuples
FOREACH (
    tuple IN tuples |
    MERGE (s:Vertex {id: head(tuple)})
    MERGE (t:Vertex {id: last(tuple)})
    CREATE (s)-[:EDGE]->(t)
);

Output:

Added 5 labels, created 5 nodes, set 5 properties, created 5 relationships.

Cypher Query, Neo4j Console.

Complete Graphs

A complete graph is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge.

Taken from Wikipedia.

A complete graph with n vertices is annotated Kn.

WITH 5 AS Count
WITH range(1, Count) AS items
FOREACH (
    s_id IN items |
    FOREACH (
        t_id IN filter(item in items WHERE item <> s_id) |
        MERGE (s:Vertex {id: s_id})
        MERGE (t:Vertex {id: t_id})
        MERGE (s)-[:EDGE]-(t)
    )
);

Output:

Added 5 labels, created 5 nodes, set 5 properties, created 10 relationships.

Cypher Query, Neo4j Console.

Utility Graph

The utility graph is a reference to the mathematical puzzle known as the three utilities problem.

Suppose there are three cottages on a plane (or sphere) and each needs to be connected to the gas, water, and electricity companies. Without using a third dimension or sending any of the connections through another company or cottage, is there a way to make all nine connections without any of the lines crossing each other?

Taken from Wikipedia.

An utility graph with 6 vertices (3 cottages and 3 utilities) is annotated K3,3.

WITH 3 AS Count
WITH
    range(1, Count) AS items_h,
    range(Count + 1, Count * 2) AS items_u
FOREACH (
    item_h IN items_h |
    CREATE (s:House {id: item_h})
    FOREACH (
        item_u IN items_u |
        MERGE (t:Utility {id: item_u})
        MERGE (s)-[:EDGE]-(t)
    )
);

Output:

Added 6 labels, created 6 nodes, set 6 properties, created 9 relationships.

Cottages vertices are coloured in green, utilities in purple.

Cypher Query, Neo4j Console.

Random Graphs

The purpose is to create a randomly generated graph.

Random Graphs - Method 1

Connect each vertex to N other vertices (N being randomly chosen).

WITH
    // Number of nodes in graph
    20 AS NodeCount,
    // Limiting factor to determine adjacent nodes (out-degree)
    // AdjacentNodeCount = rand(1, NodeCount) / AdjacentNodeCountFactor
    5 AS AdjacentNodeCountFactor
WITH
    NodeCount,
    AdjacentNodeCountFactor,
    range(1, NodeCount) AS items
FOREACH (
    item_s in items |
    MERGE (s:Vertex {id: item_s})
    FOREACH (
        item_t in filter(
            y IN [
                x IN range(1, toInteger(round((rand() * (NodeCount - 1) + 1 / AdjacentNodeCountFactor))) /*Adjacent Nodes Count*/) |
                toInteger(round(rand() * (NodeCount - 1) + 1)) /*Target Node*/
            ] WHERE y <> item_s /*Avoid self-relationships*/
        ) |
        MERGE (t:Vertex {id: item_t})
        MERGE (s)-[:EDGE]-(t)
    )
);

Output:

Added 20 labels, created 20 nodes, set 20 properties, created 116 relationships.

Cypher Query v1, Neo4j Console.

Caveat: this method provides a limited control over the maximum degree of each vertex (the degree of an edge is the number of vertices connected to that edge).

Random Graphs - Method 2

Build vertices relationships from an adjacency matrix.

WITH
    // Number of nodes in graph
    20 AS NodeCount,
    // Probability that two vertices are adjacent (default = 1/2, as with unbiased flipped coin).
    1.0 / 10 AS AdjacencyProbability
WITH
    NodeCount,
    AdjacencyProbability,
    [x IN range(1, toInteger(NodeCount ^ 2)) | toInteger(rand() * (1 + AdjacencyProbability * 2))] AS AdjacencyMatrix
FOREACH (
    row IN range(1, NodeCount) |
    MERGE (s:Vertex {id: row})
    FOREACH (
        col IN [c IN range(1, NodeCount) WHERE c <> row | c] /*Avoid self-relationships*/ |
        FOREACH (
            // Pick coordinates of adjacent vertices.
            item_t IN filter(
                i IN [(row - 1) * NodeCount + col]
                WHERE AdjacencyMatrix[i] <> 0
             ) |
            MERGE (t:Vertex {id: col})
            MERGE (s)-[:EDGE]-(t)
        )
    )
);

Output:

Added 20 labels, created 20 nodes, set 20 properties, created 62 relationships

This method provides more control over the maximum degree of each vertex, using AdjacencyProbability value.

Cypher Query v2, Neo4j Console.

Implementation Details

Random numbers are generated using rand function, which returns a real number between 0 and 1.

Getting a random number between 1 and 15:

RETURN round(rand() * (15 - 1) + 1);

Cypher Query, Neo4j Console.

Getting a random number in set {0, 1} (Boolean):

RETURN toInteger(rand() * 2)

Cypher Query, Neo4j Console.

Subgraphs

A subgraph of a graph G is another graph formed from a subset of the vertices and edges of G.

Taken from Wikipedia.

A subgraph is obtained by selectively removing edges and vertices from a graph.

By opposition, a supergraph is obtained by selectively adding edges and vertices to a graph.

Taken from Introduction to Graph Theory.

The goal is to find a subgraph of a certain type :

Circular Subgraphs

MATCH
    (a)-[]-(b),
    (b)-[]-(c),
    (c)-[]-(d),
    (d)-[]-(e)
WHERE
    NOT a IN [b, c, d, e]
    AND NOT b IN [c, d, e]
    AND NOT c IN [d, e]
    AND NOT d IN [e]
RETURN *;

Cypher Query, Neo4j Console.

Complete Subgraphs

Complete Subgraphs - Method 1

MATCH
    (a)-[]-(b), (a)-[]-(c), (a)-[]-(d), (a)-[]-(e),
    (b)-[]-(c), (b)-[]-(d), (b)-[]-(e),
    (c)-[]-(d), (c)-[]-(e),
    (d)-[]-(e)
WHERE
    NOT a IN [b, c, d, e]
    AND NOT b IN [c, d, e]
    AND NOT c IN [d, e]
    AND NOT d IN [e]
RETURN *
LIMIT 1;

Cypher Query, Neo4j Console.

Caveat: the WHERE part of the query, used to guarantee distinct vertices, is repetitive.

Complete Subgraphs - Method 2

The following is an alternative query using Awesome Procedures On Cypher for the WHERE part.

// Alternative query using APOC
MATCH
    (a)-[]-(b), (a)-[]-(c), (a)-[]-(d), (a)-[]-(e),
    (b)-[]-(c), (b)-[]-(d), (b)-[]-(e),
    (c)-[]-(d), (c)-[]-(e),
    (d)-[]-(e)
WHERE size(apoc.coll.toSet([a, b, c, d, e])) = 5
RETURN *
LIMIT 1;

Cypher Query.

Utility Subgraphs

MATCH
    (h1)-[r1_1]-(u1),
    (h1)-[r1_2]-(u2),
    (h1)-[r1_3]-(u3),
    (h2)-[r2_1]-(u1),
    (h2)-[r2_2]-(u2),
    (h2)-[r2_3]-(u3),
    (h3)-[r3_1]-(u1),
    (h3)-[r3_2]-(u2),
    (h3)-[r3_3]-(u3)
WHERE
    NOT h1 IN [h2, h3, u1, u2, u3]
    AND NOT h2 IN [h3, u1, u2, u3]
    AND NOT h3 IN [u1, u2, u3]
    AND NOT u1 IN [u2, u3]
    AND NOT u2 IN [u3]
RETURN
    h1, h2, h3,
    u1, u2, u3,
    r1_1, r1_2, r1_3,
    r2_1, r2_2, r2_3,
    r3_1, r3_2, r3_3
LIMIT 1;

Cypher Query, Neo4j Console.

Caveat: as with the complete subgraph, the WHERE part of the query is repetitive, and can be simplified with the help of Awesome Procedures On Cypher, as shown in this alternative Cypher query.

Planar Graphs

In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross each other.

Taken from Wikipedia.

A graph is planar if it is isomorphic to a graph that has been drawn in a plane without edge-crossings. UG and K5 are examples of nonplanar graphs.

If some new vertices of degree 2 are added to some of the edges of a graph G, the resulting graph H is called an expansion of G.

Taken from Introduction to Graph Theory.

Identifying Nonplanar Graphs

Kuratowski's theorem states that:

Every nonplanar graph is a supergraph of an expansion of UG or K5.

The standard method consists in finding a subgraph that is an expansion of UG or K5 (as stated in pages 85-86 of Introduction to Graph Theory book).

As this method could lead to an never-ending task (the set of of expansions of a graph being non-finite), we are going to reason in a reverse-way.

Method:

  1. Reduce graph by removing all vertices (and its edges) having degree 2.
    • Find vertices (b) having degree 2: (a)-(b)-(c).
    • Substitute path by removing (b) and creating new edge between connected vertices: (a)-(c).
    • Repeat the operation until no more vertices are found.
  2. Find K5 or UG subgraphs.

Test Cases

The following is an expansion of K5 graph:

Red vertices were generated by graph expansion.

Cypher Query, Neo4j Console.

The following is an expansion of the Utility Graph:

Cypher Query, Neo4j Console.

Graph Reduction

Graph Reduction - Method 1

Our first method requires two scripts:

  1. Identification of vertices having degree 2.
  2. Vertices removal and edge substitution.

The all process must be repeated until no more vertices are found.

Step 1:

MATCH (n)-[r]-()
WITH n, COUNT(r) AS Degree
WHERE Degree = 2
WITH collect(DISTINCT n) AS Nodes
RETURN Nodes;

Cypher Query

Step 2:

MATCH (n)-[r]-()
WITH n, COUNT(r) AS Degree
WHERE Degree = 2
MATCH (a)-[r1]-(n)-[r2]-(b)
MERGE (a)-[:EDGE]-(b)
DETACH DELETE n;

Cypher Query

Caveat: this method requires some processing logic around Cypher queries execution, making it more complex to implement than if we had a single Cypher query.

Graph Reduction - Method 2

Our second method consists in:

  1. Getting all vertices involved in expanded paths (i.e. only containing vertices with degree 2).
  2. Deleting in-between vertices and their connected edges.
  3. Adding an edge between starting and ending vertices of each path.
// Paths with two or more edges
MATCH p = (s)-[*2..]-(t)
WITH nodes(p) AS PathNodes
WITH
    // Vertices in path
    PathNodes,
    // Starting and ending vertices
    [head(PathNodes), last(PathNodes)] AS ExternalNodes
WITH
    ExternalNodes,
    // In-between vertices...
    filter(n IN PathNodes WHERE NOT n IN ExternalNodes) AS InsideNodes
WHERE
    // ...having degree 2
    all(n IN InsideNodes WHERE size((n)-[]-()) = 2)
WITH
    InsideNodes,
    ExternalNodes
// Create edge between starting and ending vertices
MATCH (s) WHERE s = head(ExternalNodes)
MATCH (t) WHERE t = last(ExternalNodes)
MERGE (s)-[:EDGE]-(t)
// Remove in-between vertices 
FOREACH(
    n IN InsideNodes |
    DETACH DELETE n
);

Cypher Query, Neo4j Console for K5, Neo4j Console for UG.

This second method is our preferred one, as it only relies on Cypher, with no execution logic around.

Adjacency Matrix

An adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.

Taken from Wikipedia.

// Get all vertices.
MATCH (n)
WITH collect(n) AS Nodes
// For each vertices combination...
WITH [n IN Nodes |
    [m IN Nodes |
		// ...Check for edge existence.
    	CASE size((n)-[]-(m))
			WHEN 0 THEN 0
			ELSE 1
		END
    ]
] AS AdjacencyMatrix
// Unroll rows.
UNWIND AdjacencyMatrix AS AdjacencyRows
RETURN AdjacencyRows;

Output for the Utility Graph:

╒═══════════════╕
│"AdjacencyRows"│
╞═══════════════╡
│[0,1,0,1,1,0]  │
├───────────────┤
│[1,0,1,0,0,1]  │
├───────────────┤
│[0,1,0,1,1,0]  │
├───────────────┤
│[1,0,1,0,0,1]  │
├───────────────┤
│[1,0,1,0,0,1]  │
├───────────────┤
│[0,1,0,1,1,0]  │
└───────────────┘

Cypher Query, Neo4j Console.

Connected Graphs

A graph is connected when there is a path between every pair of vertices. In a connected graph, there are no unreachable vertices. A graph that is not connected is disconnected.

Taken from Wikipedia.

// Get all vertices.
MATCH (n)
WITH collect(n) AS Nodes
// For each vertices combination...
WITH [n IN Nodes |
    [m IN Nodes WHERE m <> n |
		// ...Check for path existence.
    	CASE length(shortestPath((n)-[*]-(m)))
			WHEN NULL THEN 0
			ELSE 1
		END
    ]
] AS PathMatrix
// Unroll connectivity matrix.
UNWIND PathMatrix AS PathRows
UNWIND PathRows AS PathCells
// Connectivity is verified if all vertices are connected
// (i.e. connectivity matrix only contains 1).
WITH count(DISTINCT PathCells) AS PathCellsCount
WITH CASE PathCellsCount
	WHEN 1 THEN "Connected"
	ELSE "Disconnected"
END AS Result
RETURN Result;

Cypher Query, Neo4j Console.

The connectivity verification can be modified as shown in this alternative Cypher query.

Regular Graphs

A regular graph is a graph where each vertex has the same number of neighbors; i.e. every vertex has the same degree.

Taken from Wikipedia.

MATCH (n)
WITH size((n)-[]-()) AS Degrees
WITH collect(DISTINCT Degrees) AS Degrees
WITH CASE size(Degrees)
    WHEN 1 THEN "Regular of degree " + Degrees[0]
    ELSE "Not regular"
END AS Result
RETURN Result;

Output for the Utility Graph:

Regular of degree 3

Cypher Query, Neo4j Console.

Conclusion

This workshop was the opportunity to demonstrate the potential of Neo4j Cypher query language in solving mathematical problems around graph theory.

There will hopefully be some additions as I'm still in the process of reading Introduction to Graph Theory book.