SQL2Gremlin teaches Apache TinkerPop's Gremlin graph traversal language using typical patterns found when querying data with SQL. The format of the Gremlin results will not necessarily match the format of the SQL results. While SQL can only provide results in a tabular form, Gremlin provides various ways to structure a result set. Next, the Gremlin queries demonstrated are for elucidatory purposes and may not be the optimal way to retrieve the desired data. If a particular query runs slow and an optimal solution is desired, please do not hesitate to ask for help on the Gremlin-users mailing list. Finally, the SQL examples presented make use of T-SQL syntax. MySQL users may not know some of the expressions (e.g. paging), but should be able to understand the purpose of the query.
If you would like to see other SQL2Gremlin translations using the Northwind dataset, please provide a ticket on the SQL2Gremlin issue tracker.
Note
|
Acknowledgement Gremlin artwork by Ketrina Yim — "safety first." |
To get started download the latest version of the Gremlin shell from www.tinkerpop.com and extract it. Then download the file northwind.groovy and start your Gremlin shell:
# find the latest Gremlin shell download
MIRRORS_URL=`curl -s http://tinkerpop.apache.org/ | grep -o 'http[s]*://.*\-console-.*\-bin\.zip'`
LATEST_URL=`curl -s $MIRRORS_URL | egrep -o 'http[s]?://[^"]*-console-[^"]*-bin.zip' | head -1`
LATEST_FILENAME=`echo ${LATEST_URL} | grep -o '[^/]*$'`
LATEST_DIRECTORY=`echo ${LATEST_FILENAME} | sed 's/-bin\.zip//'`
# download and extract the Gremlin shell
wget -q ${LATEST_URL}
unzip -q ${LATEST_FILENAME}
# start the Gremlin shell
wget -q http://sql2gremlin.com/assets/northwind.groovy -O /tmp/northwind.groovy
${LATEST_DIRECTORY}/bin/gremlin.sh -i /tmp/northwind.groovy
In your Gremlin shell create the Northwind graph, a graph traversal source and you’re ready to go:
gremlin> graph = NorthwindFactory.createGraph()
==>tinkergraph[vertices:3209 edges:6177]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:3209 edges:6177], standard]
The graph is now filled with vertices and edges. Vertices have a number of properties, depending on what they represent. The following properties are globally indexed for fast lookups:
VertexLabel | Property | Description |
---|---|---|
region |
name |
The code or name for a specific region. |
country |
name |
The code or name for a specific country. |
category |
name |
The name of a specific category. |
customer |
customerId |
The well-known Northwind customer identifier (e.g. ALFKI). |
This sample shows how to query all categories.
SQL
SELECT *
FROM Categories
Gremlin
g.V().hasLabel("category").valueMap()
References:
This sample shows how to query the names of all categories.
SQL
SELECT CategoryName
FROM Categories
Gremlin
g.V().hasLabel("category").values("name")
References:
This sample shows how to query the names and descriptions of all categories.
SQL
SELECT CategoryName, Description
FROM Categories
Gremlin
g.V().hasLabel("category").valueMap("name", "description")
References:
This sample shows how to query the length of the name of all categories.
SQL
SELECT LEN(CategoryName)
FROM Categories
Gremlin
g.V().hasLabel("category").values("name"). map {it.get().length()}
References:
This sample shows how to query all distinct lengths of category names.
SQL
SELECT DISTINCT LEN(CategoryName)
FROM Categories
Gremlin
g.V().hasLabel("category").values("name"). map {it.get().length()}.dedup()
References:
This sample shows how to query the length of the longest category name.
SQL
SELECT MAX(LEN(CategoryName))
FROM Categories
Gremlin
g.V().hasLabel("category").values("name"). map {it.get().length()}.max()
References:
This sample shows how to query all products having no unit in stock.
SQL
SELECT ProductName, UnitsInStock
FROM Products
WHERE UnitsInStock = 0
Gremlin
g.V().has("product", "unitsInStock", 0).valueMap("name", "unitsInStock")
References:
This sample shows how to query all products where the number of units on order is not 0.
SQL
SELECT ProductName, UnitsOnOrder
FROM Products
WHERE NOT(UnitsOnOrder = 0)
Gremlin
g.V().has("product", "unitsOnOrder", neq(0)). valueMap("name", "unitsOnOrder")
References:
This sample shows how to query all products with a minimum price of 5 and maximum price below 10.
SQL
SELECT ProductName, UnitPrice
FROM Products
WHERE UnitPrice >= 5 AND UnitPrice < 10
Gremlin
g.V().has("product", "unitPrice", between(5f, 10f)). valueMap("name", "unitPrice")
References:
This sample shows how to query all discontinued products that are still not out of stock.
SQL
SELECT ProductName, UnitsInStock
FROM Products
WHERE Discontinued = 1
AND UnitsInStock <> 0
Gremlin
g.V().has("product", "discontinued", true).has("unitsInStock", neq(0)). valueMap("name", "unitsInStock")
References:
This sample shows how to query all products ordered by unit price.
SQL
SELECT ProductName, UnitPrice
FROM Products
ORDER BY UnitPrice ASC
Gremlin
g.V().hasLabel("product").order().by("unitPrice", incr). valueMap("name", "unitPrice")
References:
This sample shows how to query all products ordered by descending unit price.
SQL
SELECT ProductName, UnitPrice
FROM Products
ORDER BY UnitPrice DESC
Gremlin
g.V().hasLabel("product").order().by("unitPrice", decr). valueMap("name", "unitPrice")
References:
This sample shows how to query the first 5 products ordered by unit price.
SQL
SELECT TOP (5) ProductName, UnitPrice
FROM Products
ORDER BY UnitPrice
Gremlin
g.V().hasLabel("product").order().by("unitPrice", incr).limit(5). valueMap("name", "unitPrice")
References:
This sample shows how to query the next 5 products (page 2) ordered by unit price.
SQL
SELECT Products.ProductName, Products.UnitPrice
FROM (SELECT ROW_NUMBER()
OVER (
ORDER BY UnitPrice) AS [ROW_NUMBER],
ProductID
FROM Products) AS SortedProducts
INNER JOIN Products
ON Products.ProductID = SortedProducts.ProductID
WHERE [ROW_NUMBER] BETWEEN 6 AND 10
ORDER BY [ROW_NUMBER]
Gremlin
g.V().hasLabel("product").order().by("unitPrice", incr).range(5, 10). valueMap("name", "unitPrice")
References:
This sample shows how to determine the most used unit price.
SQL
SELECT TOP(1) UnitPrice
FROM (SELECT Products.UnitPrice,
COUNT(*) AS [Count]
FROM Products
GROUP BY Products.UnitPrice) AS T
ORDER BY [Count] DESC
Gremlin
g.V().hasLabel("product").groupCount().by("unitPrice"). order(local).by(values, decr).select(keys).limit(local, 1)
References:
This sample shows how to query all products from a specific category.
SQL
SELECT Products.ProductName
FROM Products
INNER JOIN Categories
ON Categories.CategoryID = Products.CategoryID
WHERE Categories.CategoryName = 'Beverages'
Gremlin
g.V().has("name","Beverages").in("inCategory").values("name")
References:
This sample shows how to count the number of orders for each customer.
SQL
SELECT Customers.CustomerID, COUNT(Orders.OrderID)
FROM Customers
LEFT JOIN Orders
ON Orders.CustomerID = Customers.CustomerID
GROUP BY Customers.CustomerID
Gremlin
g.V().hasLabel("customer").match( __.as("c").values("customerId").as("customerId"), __.as("c").out("ordered").count().as("orders") ).select("customerId", "orders")
References:
This sample shows how to concatenate two result sets (customers whos company name starts with 'A' and customers whos company name starts with 'E').
SQL
SELECT [customer].[CompanyName]
FROM [Customers] AS [customer]
WHERE [customer].[CompanyName] LIKE 'A%'
UNION ALL
SELECT [customer].[CompanyName]
FROM [Customers] AS [customer]
WHERE [customer].[CompanyName] LIKE 'E%'
Gremlin
g.V().hasLabel("customer").union( filter {it.get().value("company")[0] == "A"}, filter {it.get().value("company")[0] == "E"}).values("company")
References:
This sample shows how to create new vertices and edges, how to update them and finally how to delete them.
SQL
INSERT INTO [Categories] ([CategoryName], [Description])
VALUES (N'Merchandising', N'Cool products to promote Gremlin')
INSERT INTO [Products] ([ProductName], [CategoryID])
SELECT TOP (1) N'Red Gremlin Jacket', [CategoryID]
FROM [Categories]
WHERE [CategoryName] = N'Merchandising'
UPDATE [Products]
SET [Products].[ProductName] = N'Green Gremlin Jacket'
WHERE [Products].[ProductName] = N'Red Gremlin Jacket'
DELETE FROM [Products]
WHERE [Products].[ProductName] = N'Green Gremlin Jacket'
DELETE FROM [Categories]
WHERE [Categories].[CategoryName] = N'Merchandising'
Gremlin
c = graph.addVertex(label, "category", "name", "Merchandising", "description", "Cool products to promote Gremlin") p = graph.addVertex(label, "product", "name", "Red Gremlin Jacket") p.addEdge("inCategory", c) g.V().has("product", "name", "Red Gremlin Jacket"). property("name", "Green Gremlin Jacket").iterate() p.remove() g.V().has("category", "name", "Merchandising").drop()
References:
This sample shows how to query all employees, their supervisors and their hierarchy level depending on where the employee is located in the supervisor chain.
SQL
WITH EmployeeHierarchy (EmployeeID,
LastName,
FirstName,
ReportsTo,
HierarchyLevel) AS
(
SELECT EmployeeID
, LastName
, FirstName
, ReportsTo
, 1 as HierarchyLevel
FROM Employees
WHERE ReportsTo IS NULL
UNION ALL
SELECT e.EmployeeID
, e.LastName
, e.FirstName
, e.ReportsTo
, eh.HierarchyLevel + 1 AS HierarchyLevel
FROM Employees e
INNER JOIN EmployeeHierarchy eh
ON e.ReportsTo = eh.EmployeeID
)
SELECT *
FROM EmployeeHierarchy
ORDER BY HierarchyLevel, LastName, FirstName
Gremlin (hierarchical)
g.V().hasLabel("employee").where(__.not(out("reportsTo"))). repeat(__.in("reportsTo")).emit().tree().by(map { def employee = it.get() employee.value("firstName") + " " + employee.value("lastName") }).next()
You can also produce the same tabular result that’s produced by SQL.
Gremlin (tabular)
g.V().hasLabel("employee").where(__.not(out("reportsTo"))). repeat(__.as("reportsTo").in("reportsTo").as("employee")).emit(). select(last, "reportsTo", "employee").by(map { def employee = it.get() employee.value("firstName") + " " + employee.value("lastName") })
References:
This sample shows how to determine the average total order value per month for each customer.
SQL
SELECT Customers.CompanyName,
COALESCE([1], 0) AS [Jan],
COALESCE([2], 0) AS [Feb],
COALESCE([3], 0) AS [Mar],
COALESCE([4], 0) AS [Apr],
COALESCE([5], 0) AS [May],
COALESCE([6], 0) AS [Jun],
COALESCE([7], 0) AS [Jul],
COALESCE([8], 0) AS [Aug],
COALESCE([9], 0) AS [Sep],
COALESCE([10], 0) AS [Oct],
COALESCE([11], 0) AS [Nov],
COALESCE([12], 0) AS [Dec]
FROM (SELECT Orders.CustomerID,
MONTH(Orders.OrderDate) AS [Month],
SUM([Order Details].UnitPrice * [Order Details].Quantity) AS Total
FROM Orders
INNER JOIN [Order Details]
ON [Order Details].OrderID = Orders.OrderID
GROUP BY Orders.CustomerID,
MONTH(Orders.OrderDate)) o
PIVOT (AVG(Total) FOR [Month] IN ([1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9],
[10],
[11],
[12])) AS [Pivot]
INNER JOIN Customers
ON Customers.CustomerID = [Pivot].CustomerID
ORDER BY Customers.CompanyName
Gremlin
months = new java.text.DateFormatSymbols().getShortMonths().toList(); [] rowTotal = {it.get().value("unitPrice") * it.get().value("quantity")}; [] g.V().hasLabel("customer").order().by("customerId", incr). where(out("ordered")).as("customer"). map { def m = g.V(it.get()).out("ordered"). group().by {new Date(it.value("orderDate")).getMonth()}. by(out("contains").map(rowTotal).sum()).next() (0..11).collectEntries {[months[it], m.containsKey(it) ? m[it] : 0]} }.as("totals").select("customer", "totals").by(id).by()
References:
This sample shows how to recommend 5 products for a specific customer. The products are chosen as follows:
-
determine what the customer has already ordered
-
determine who else ordered the same products
-
determine what others also ordered
-
determine products which were not already ordered by the initial customer, but ordered by the others
-
rank products by occurence in other orders
SQL
SELECT TOP (5) [t14].[ProductName]
FROM (SELECT COUNT(*) AS [value],
[t13].[ProductName]
FROM [customers] AS [t0]
CROSS APPLY (SELECT [t9].[ProductName]
FROM [orders] AS [t1]
CROSS JOIN [order details] AS [t2]
INNER JOIN [products] AS [t3]
ON [t3].[ProductID] = [t2].[ProductID]
CROSS JOIN [order details] AS [t4]
INNER JOIN [orders] AS [t5]
ON [t5].[OrderID] = [t4].[OrderID]
LEFT JOIN [customers] AS [t6]
ON [t6].[CustomerID] = [t5].[CustomerID]
CROSS JOIN ([orders] AS [t7]
CROSS JOIN [order details] AS [t8]
INNER JOIN [products] AS [t9]
ON [t9].[ProductID] = [t8].[ProductID])
WHERE NOT EXISTS(SELECT NULL AS [EMPTY]
FROM [orders] AS [t10]
CROSS JOIN [order details] AS [t11]
INNER JOIN [products] AS [t12]
ON [t12].[ProductID] = [t11].[ProductID]
WHERE [t9].[ProductID] = [t12].[ProductID]
AND [t10].[CustomerID] = [t0].[CustomerID]
AND [t11].[OrderID] = [t10].[OrderID])
AND [t6].[CustomerID] <> [t0].[CustomerID]
AND [t1].[CustomerID] = [t0].[CustomerID]
AND [t2].[OrderID] = [t1].[OrderID]
AND [t4].[ProductID] = [t3].[ProductID]
AND [t7].[CustomerID] = [t6].[CustomerID]
AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]
WHERE [t0].[CustomerID] = N'ALFKI'
GROUP BY [t13].[ProductName]) AS [t14]
ORDER BY [t14].[value] DESC
Gremlin
g.V().has("customer", "customerId", "ALFKI").as("customer"). out("ordered").out("contains").out("is").aggregate("products"). in("is").in("contains").in("ordered").where(neq("customer")). out("ordered").out("contains").out("is").where(without("products")). groupCount().order(local).by(values, decr).select(keys).limit(local, 5). unfold().values("name")
References: