Skip to content

Latest commit

 

History

History
814 lines (643 loc) · 24.9 KB

sql2gremlin.asciidoc

File metadata and controls

814 lines (643 loc) · 24.9 KB
SQL2Gremlin

Introduction

SQL2Gremlin teaches Apache TinkerPop's Gremlin graph traversal language using typical patterns found when querying data with SQL. The format of the Gremlin results will not necessarily match the format of the SQL results. While SQL can only provide results in a tabular form, Gremlin provides various ways to structure a result set. Next, the Gremlin queries demonstrated are for elucidatory purposes and may not be the optimal way to retrieve the desired data. If a particular query runs slow and an optimal solution is desired, please do not hesitate to ask for help on the Gremlin-users mailing list. Finally, the SQL examples presented make use of T-SQL syntax. MySQL users may not know some of the expressions (e.g. paging), but should be able to understand the purpose of the query.

If you would like to see other SQL2Gremlin translations using the Northwind dataset, please provide a ticket on the SQL2Gremlin issue tracker.

Note

Acknowledgement

Gremlin artwork by Ketrina Yim — "safety first."

Northwind Graph Model

Northwind Graph Model

Getting Started

To get started download the latest version of the Gremlin shell from www.tinkerpop.com and extract it. Then download the file northwind.groovy and start your Gremlin shell:

# find the latest Gremlin shell download
MIRRORS_URL=`curl -s http://tinkerpop.apache.org/ | grep -o 'http[s]*://.*\-console-.*\-bin\.zip'`
LATEST_URL=`curl -s $MIRRORS_URL | egrep -o 'http[s]?://[^"]*-console-[^"]*-bin.zip' | head -1`
LATEST_FILENAME=`echo ${LATEST_URL} | grep -o '[^/]*$'`
LATEST_DIRECTORY=`echo ${LATEST_FILENAME} | sed 's/-bin\.zip//'`

# download and extract the Gremlin shell
wget -q ${LATEST_URL}
unzip -q ${LATEST_FILENAME}

# start the Gremlin shell
wget -q http://sql2gremlin.com/assets/northwind.groovy -O /tmp/northwind.groovy
${LATEST_DIRECTORY}/bin/gremlin.sh -i /tmp/northwind.groovy

In your Gremlin shell create the Northwind graph, a graph traversal source and you’re ready to go:

gremlin> graph = NorthwindFactory.createGraph()
==>tinkergraph[vertices:3209 edges:6177]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:3209 edges:6177], standard]

The graph is now filled with vertices and edges. Vertices have a number of properties, depending on what they represent. The following properties are globally indexed for fast lookups:

VertexLabel Property Description

region

name

The code or name for a specific region.

country

name

The code or name for a specific country.

category

name

The name of a specific category.

customer

customerId

The well-known Northwind customer identifier (e.g. ALFKI).

Select

Select all

This sample shows how to query all categories.

SQL

SELECT *
  FROM Categories

Gremlin

g.V().hasLabel("category").valueMap()

References:

Select single column

This sample shows how to query the names of all categories.

SQL

SELECT CategoryName
  FROM Categories

Gremlin

g.V().hasLabel("category").values("name")

References:

Select multiple columns

This sample shows how to query the names and descriptions of all categories.

SQL

SELECT CategoryName, Description
  FROM Categories

Gremlin

g.V().hasLabel("category").valueMap("name", "description")

References:

Select calculated column

This sample shows how to query the length of the name of all categories.

SQL

SELECT LEN(CategoryName)
  FROM Categories

Gremlin

g.V().hasLabel("category").values("name").
      map {it.get().length()}

References:

Select distinct values

This sample shows how to query all distinct lengths of category names.

SQL

SELECT DISTINCT LEN(CategoryName)
  FROM Categories

Gremlin

g.V().hasLabel("category").values("name").
      map {it.get().length()}.dedup()

References:

Select scalar value

This sample shows how to query the length of the longest category name.

SQL

SELECT MAX(LEN(CategoryName))
  FROM Categories

Gremlin

g.V().hasLabel("category").values("name").
      map {it.get().length()}.max()

References:

Filtering

Filter by equality

This sample shows how to query all products having no unit in stock.

SQL

SELECT ProductName, UnitsInStock
  FROM Products
 WHERE UnitsInStock = 0

Gremlin

g.V().has("product", "unitsInStock", 0).valueMap("name", "unitsInStock")

References:

Filter by inequality

This sample shows how to query all products where the number of units on order is not 0.

SQL

SELECT ProductName, UnitsOnOrder
  FROM Products
 WHERE NOT(UnitsOnOrder = 0)

Gremlin

g.V().has("product", "unitsOnOrder", neq(0)).
      valueMap("name", "unitsOnOrder")

References:

Filter by value range

This sample shows how to query all products with a minimum price of 5 and maximum price below 10.

SQL

SELECT ProductName, UnitPrice
  FROM Products
 WHERE UnitPrice >= 5 AND UnitPrice < 10

Gremlin

g.V().has("product", "unitPrice", between(5f, 10f)).
      valueMap("name", "unitPrice")

References:

Multiple filter conditions

This sample shows how to query all discontinued products that are still not out of stock.

SQL

SELECT ProductName, UnitsInStock
  FROM Products
 WHERE Discontinued = 1
   AND UnitsInStock <> 0

Gremlin

g.V().has("product", "discontinued", true).has("unitsInStock", neq(0)).
     valueMap("name", "unitsInStock")

References:

Ordering

Order by value ascending

This sample shows how to query all products ordered by unit price.

SQL

  SELECT ProductName, UnitPrice
    FROM Products
ORDER BY UnitPrice ASC

Gremlin

g.V().hasLabel("product").order().by("unitPrice", incr).
      valueMap("name", "unitPrice")

References:

Order by value descending

This sample shows how to query all products ordered by descending unit price.

SQL

  SELECT ProductName, UnitPrice
    FROM Products
ORDER BY UnitPrice DESC

Gremlin

g.V().hasLabel("product").order().by("unitPrice", decr).
      valueMap("name", "unitPrice")

References:

Paging

Limit number of results

This sample shows how to query the first 5 products ordered by unit price.

SQL

  SELECT TOP (5) ProductName, UnitPrice
    FROM Products
ORDER BY UnitPrice

Gremlin

g.V().hasLabel("product").order().by("unitPrice", incr).limit(5).
      valueMap("name", "unitPrice")

References:

Paged result set

This sample shows how to query the next 5 products (page 2) ordered by unit price.

SQL

   SELECT Products.ProductName, Products.UnitPrice
     FROM (SELECT ROW_NUMBER()
                    OVER (
                      ORDER BY UnitPrice) AS [ROW_NUMBER],
                  ProductID
             FROM Products) AS SortedProducts
       INNER JOIN Products
               ON Products.ProductID = SortedProducts.ProductID
    WHERE [ROW_NUMBER] BETWEEN 6 AND 10
 ORDER BY [ROW_NUMBER]

Gremlin

g.V().hasLabel("product").order().by("unitPrice", incr).range(5, 10).
      valueMap("name", "unitPrice")

References:

Grouping

Group by value

This sample shows how to determine the most used unit price.

SQL

  SELECT TOP(1) UnitPrice
    FROM (SELECT Products.UnitPrice,
                 COUNT(*) AS [Count]
            FROM Products
        GROUP BY Products.UnitPrice) AS T
ORDER BY [Count] DESC

Gremlin

g.V().hasLabel("product").groupCount().by("unitPrice").
      order(local).by(values, decr).select(keys).limit(local, 1)

References:

Joining

Inner join

This sample shows how to query all products from a specific category.

SQL

    SELECT Products.ProductName
      FROM Products
INNER JOIN Categories
        ON Categories.CategoryID = Products.CategoryID
     WHERE Categories.CategoryName = 'Beverages'

Gremlin

g.V().has("name","Beverages").in("inCategory").values("name")

References:

Left join

This sample shows how to count the number of orders for each customer.

SQL

    SELECT Customers.CustomerID, COUNT(Orders.OrderID)
      FROM Customers
 LEFT JOIN Orders
        ON Orders.CustomerID = Customers.CustomerID
  GROUP BY Customers.CustomerID

Gremlin

g.V().hasLabel("customer").match(
  __.as("c").values("customerId").as("customerId"),
  __.as("c").out("ordered").count().as("orders")
).select("customerId", "orders")

References:

Miscellaneous

Concatenate

This sample shows how to concatenate two result sets (customers whos company name starts with 'A' and customers whos company name starts with 'E').

SQL

SELECT [customer].[CompanyName]
  FROM [Customers] AS [customer]
 WHERE [customer].[CompanyName] LIKE 'A%'
 UNION ALL
SELECT [customer].[CompanyName]
  FROM [Customers] AS [customer]
 WHERE [customer].[CompanyName] LIKE 'E%'

Gremlin

g.V().hasLabel("customer").union(
  filter {it.get().value("company")[0] == "A"},
  filter {it.get().value("company")[0] == "E"}).values("company")

References:

Create, Update and Delete

This sample shows how to create new vertices and edges, how to update them and finally how to delete them.

SQL

INSERT INTO [Categories] ([CategoryName], [Description])
     VALUES (N'Merchandising', N'Cool products to promote Gremlin')

INSERT INTO [Products] ([ProductName], [CategoryID])
     SELECT TOP (1) N'Red Gremlin Jacket', [CategoryID]
       FROM [Categories]
      WHERE [CategoryName] = N'Merchandising'

UPDATE [Products]
   SET [Products].[ProductName] = N'Green Gremlin Jacket'
 WHERE [Products].[ProductName] = N'Red Gremlin Jacket'

DELETE FROM [Products]
 WHERE [Products].[ProductName] = N'Green Gremlin Jacket'

DELETE FROM [Categories]
 WHERE [Categories].[CategoryName] = N'Merchandising'

Gremlin

c = graph.addVertex(label, "category",
          "name", "Merchandising",
          "description", "Cool products to promote Gremlin")

p = graph.addVertex(label, "product",
          "name", "Red Gremlin Jacket")

p.addEdge("inCategory", c)

g.V().has("product", "name", "Red Gremlin Jacket").
      property("name", "Green Gremlin Jacket").iterate()

p.remove()
g.V().has("category", "name", "Merchandising").drop()

References:

CTE

Recursive query

This sample shows how to query all employees, their supervisors and their hierarchy level depending on where the employee is located in the supervisor chain.

SQL

WITH EmployeeHierarchy (EmployeeID,
                        LastName,
                        FirstName,
                        ReportsTo,
                        HierarchyLevel) AS
(
    SELECT EmployeeID
         , LastName
         , FirstName
         , ReportsTo
         , 1 as HierarchyLevel
      FROM Employees
     WHERE ReportsTo IS NULL

     UNION ALL

    SELECT e.EmployeeID
         , e.LastName
         , e.FirstName
         , e.ReportsTo
         , eh.HierarchyLevel + 1 AS HierarchyLevel
      FROM Employees e
INNER JOIN EmployeeHierarchy eh
        ON e.ReportsTo = eh.EmployeeID
)
  SELECT *
    FROM EmployeeHierarchy
ORDER BY HierarchyLevel, LastName, FirstName

Gremlin (hierarchical)

g.V().hasLabel("employee").where(__.not(out("reportsTo"))).
      repeat(__.in("reportsTo")).emit().tree().by(map {
        def employee = it.get()
        employee.value("firstName") + " " + employee.value("lastName")
      }).next()

You can also produce the same tabular result that’s produced by SQL.

Gremlin (tabular)

g.V().hasLabel("employee").where(__.not(out("reportsTo"))).
      repeat(__.as("reportsTo").in("reportsTo").as("employee")).emit().
      select(last, "reportsTo", "employee").by(map {
        def employee = it.get()
        employee.value("firstName") + " " + employee.value("lastName")
      })

References:

Complex

Pivots

This sample shows how to determine the average total order value per month for each customer.

SQL

    SELECT Customers.CompanyName,
           COALESCE([1], 0)  AS [Jan],
           COALESCE([2], 0)  AS [Feb],
           COALESCE([3], 0)  AS [Mar],
           COALESCE([4], 0)  AS [Apr],
           COALESCE([5], 0)  AS [May],
           COALESCE([6], 0)  AS [Jun],
           COALESCE([7], 0)  AS [Jul],
           COALESCE([8], 0)  AS [Aug],
           COALESCE([9], 0)  AS [Sep],
           COALESCE([10], 0) AS [Oct],
           COALESCE([11], 0) AS [Nov],
           COALESCE([12], 0) AS [Dec]
      FROM (SELECT Orders.CustomerID,
                   MONTH(Orders.OrderDate)                                   AS [Month],
                   SUM([Order Details].UnitPrice * [Order Details].Quantity) AS Total
              FROM Orders
        INNER JOIN [Order Details]
                ON [Order Details].OrderID = Orders.OrderID
          GROUP BY Orders.CustomerID,
                   MONTH(Orders.OrderDate)) o
     PIVOT (AVG(Total) FOR [Month] IN ([1],
                                       [2],
                                       [3],
                                       [4],
                                       [5],
                                       [6],
                                       [7],
                                       [8],
                                       [9],
                                       [10],
                                       [11],
                                       [12])) AS [Pivot]
INNER JOIN Customers
        ON Customers.CustomerID = [Pivot].CustomerID
  ORDER BY Customers.CompanyName

Gremlin

months = new java.text.DateFormatSymbols().getShortMonths().toList(); []
rowTotal = {it.get().value("unitPrice") * it.get().value("quantity")}; []

g.V().hasLabel("customer").order().by("customerId", incr).
      where(out("ordered")).as("customer").
      map {
        def m = g.V(it.get()).out("ordered").
                  group().by {new Date(it.value("orderDate")).getMonth()}.
                          by(out("contains").map(rowTotal).sum()).next()
        (0..11).collectEntries {[months[it], m.containsKey(it) ? m[it] : 0]}
      }.as("totals").select("customer", "totals").by(id).by()

References:

Recommendation

This sample shows how to recommend 5 products for a specific customer. The products are chosen as follows:

  • determine what the customer has already ordered

  • determine who else ordered the same products

  • determine what others also ordered

  • determine products which were not already ordered by the initial customer, but ordered by the others

  • rank products by occurence in other orders

SQL

  SELECT TOP (5) [t14].[ProductName]
    FROM (SELECT COUNT(*) AS [value],
                 [t13].[ProductName]
            FROM [customers] AS [t0]
     CROSS APPLY (SELECT [t9].[ProductName]
                    FROM [orders] AS [t1]
              CROSS JOIN [order details] AS [t2]
              INNER JOIN [products] AS [t3]
                      ON [t3].[ProductID] = [t2].[ProductID]
              CROSS JOIN [order details] AS [t4]
              INNER JOIN [orders] AS [t5]
                      ON [t5].[OrderID] = [t4].[OrderID]
               LEFT JOIN [customers] AS [t6]
                      ON [t6].[CustomerID] = [t5].[CustomerID]
              CROSS JOIN ([orders] AS [t7]
                          CROSS JOIN [order details] AS [t8]
                          INNER JOIN [products] AS [t9]
                                  ON [t9].[ProductID] = [t8].[ProductID])
                   WHERE NOT EXISTS(SELECT NULL AS [EMPTY]
                                      FROM [orders] AS [t10]
                                CROSS JOIN [order details] AS [t11]
                                INNER JOIN [products] AS [t12]
                                        ON [t12].[ProductID] = [t11].[ProductID]
                                     WHERE [t9].[ProductID] = [t12].[ProductID]
                                       AND [t10].[CustomerID] = [t0].[CustomerID]
                                       AND [t11].[OrderID] = [t10].[OrderID])
                     AND [t6].[CustomerID] <> [t0].[CustomerID]
                     AND [t1].[CustomerID] = [t0].[CustomerID]
                     AND [t2].[OrderID] = [t1].[OrderID]
                     AND [t4].[ProductID] = [t3].[ProductID]
                     AND [t7].[CustomerID] = [t6].[CustomerID]
                     AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]
           WHERE [t0].[CustomerID] = N'ALFKI'
        GROUP BY [t13].[ProductName]) AS [t14]
ORDER BY [t14].[value] DESC

Gremlin

g.V().has("customer", "customerId", "ALFKI").as("customer").
      out("ordered").out("contains").out("is").aggregate("products").
      in("is").in("contains").in("ordered").where(neq("customer")).
      out("ordered").out("contains").out("is").where(without("products")).
      groupCount().order(local).by(values, decr).select(keys).limit(local, 5).
      unfold().values("name")

References: