Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

Latest commit

 

History

History
369 lines (257 loc) · 20.6 KB

eva_101.md

File metadata and controls

369 lines (257 loc) · 20.6 KB

How do I get started using Eva today?

There are a number of answers to that question but the easiest way to getting started is to run the Clojure REPL, obtain a connection to the in-memory database, transact your schema and data, and then start running queries against that data.

A Clojure... REPL?

If you are not familiar with the concept of a REPL, it is simply a Read–eval–print loop.

From the terminal, run:

brew install leiningen

Leiningen is the easiest way to start using Clojure. From the directory you checked out the Eva repository, launch the REPL with:

lein repl

In-Memory Database

Once you are inside the REPL we will define a variable which will obtain a connection to the in-memory database.

(def conn (eva/connect {:local true}))

Basically we've defined a variable conn whose value is the output of the call to the connect function in the eva namespace with the argument {:local true}. {:local true} allows us to create an in-memory database.

A Schema

Similarly to a traditional RDBMS, Eva has a schema used to describe a set of attributes that can be associated with entities. Let us start with something simple.

(def schema [
{:db/id (eva/tempid :db.part/db)
 :db/ident :book/title
 :db/doc "Title of a book"
 :db/valueType :db.type/string
 :db/cardinality :db.cardinality/one
 :db.install/_attribute :db.part/db}

{:db/id (eva/tempid :db.part/db)
 :db/ident :book/year_published
 :db/doc "Date book was published"
 :db/valueType :db.type/long
 :db/cardinality :db.cardinality/one
 :db.install/_attribute :db.part/db}

{:db/id (eva/tempid :db.part/db)
 :db/ident :book/author
 :db/doc "Author of a book"
 :db/valueType :db.type/ref
 :db/cardinality :db.cardinality/one
 :db.install/_attribute :db.part/db}

{:db/id (eva/tempid :db.part/db)
 :db/ident :author/name
 :db/doc "Name of author"
 :db/valueType :db.type/string
 :db/cardinality :db.cardinality/one
 :db.install/_attribute :db.part/db}
 ])

For those of you familiar with schema-based databases you probably get the gist of what we are trying to do here. That is, define a schema containing books and authors.

:db/id is a keyword used to reference the entity-id for the data we are about to transact.

eva/tempid returns a temporary id used to reference this datom within the context of the transaction. Tempids are resolved to permanent unique id[entifiers] upon completion of the transaction.

:db.part/db represents the partition name and is passed to the tempid function.

:db/ident is used to give the attribute a name that we can use to easily reference it later.

:db/doc allows us to define a docstring for this particular attribute.

:db/valueType refers to the type of data used by this entity. A full list of supported types can be found here. Worth calling out is the :db.type/ref type, which allows you to reference other entities.

:db.install/_attribute simply states we want to install this entity into the db.part/db partition.

Now that we've defined our schema we need to transact it into the database.

Transact!

Transact allows you to make assertions and retraction where assertions are analogous to a traditional insert/update, and retractions are analogous to a traditional delete.

Remember that the variable conn contains our connection to the database. With the database schema stored in the schema variable, transacting it into the database is as simple as:

@(eva/transact conn schema)

Note: The transaction is asynchronous, returning a future. The @ dereferences this future, blocking until there is a value to return.

The response of a successful transact is a map with four keys, :db-before, :db-after, :tempids, and :tx-data. :db-before and :db-after contains snapshots of the database before and after the transaction. :tempids contains a mapping of temporary ids that occurred as part of the transaction to their corresponding permanent ids. :tx-data contains the individual #datom vectors that were inserted into the database.

Alright, now let's add some data! First we define a variable called first-book:

(def first-book [[:db/add (eva/tempid :db.part/user) :book/title "First Book"]])

This variable will transact a single fact, which we call a datom, into the database. All datoms are made up of the 5-tuple, [eid attr val tx added?]. In this case, eid corresponds to (eva/tempid :db.part/user), attr -> :book/title, and val -> "First Book". tx and added? are filled in for us implicitly. tx refers to the id of the transaction entity that is created as part of every successful transaction and added? is simply a boolean value indicating whether this fact was added or retracted. The datom is the smallest unit of data that can be manipulated in the database.

:db/add is the keyword used to indicate an upsertion (insert/update) of data. Unlike our schema, :db/id is not required here as it is implicit when adding data in list form (more on that later). Instead of the db partition, which we used for our schema, we now use the :db.part/user partition.

Transact this data with:

@(eva/transact conn first-book)

Now for a query.

We have transacted some data, now let's write a basic query to go and get it.

(def get-book '[:find ?b 
                :where [?b :book/title "First Book"]])

So we defined a variable called get-book, and you are probably wondering what that ' means. Wrapping a form with a single quote will return the form without evaluating it as code. Try defining the variable without the ' and see what happens.

Next is the query, [:find ?b :where [?b :book/title "First Book"]]. There is a lot to take in here so we'll break it down piece-by-piece. First of all, every query you write needs to be wrapped in a vector ([...]). The query starts with the :find keyword followed by a number of logic variables (lvar for short) denoted with a ?. The :where clause follows and, similarly to SQL, is used to restrict the query results.

The tuple [?b :book/title "First Book"] is called a data pattern. All querying is essentially matching that pattern to the datom 5-tuple we discussed earlier ([eid attr val tx added?]). In this case we are asking for all of the entity ids (?b) which have the attribute :book/title with value "First Book". What about tx and added?, why don't they appear in the clause? Simply, if not present they are replaced with implicit blanks. Expanding the tuple to its full form would yield, [?b :book/title "First Book" _ _]. We'll talk more about blanks later.

Before we can run this query we need to obtain a database snapshot which can be accomplished with:

(def db (eva/db conn))

Passing our connection (conn) as the argument to eva/db we get back the current database snapshot. This value represents the state of the database at a given point in time and, given the same query, will always return the same result.

Once we have that we can run our query with:

(eva/q get-book db)

We should see an entity-id as the result.

Transacting a small dataset

Run each of the following statements individually in the REPL.

(def dataset
[{:db/id (eva/tempid :db.part/user -1) :author/name "Martin Kleppman"}
{:db/id (eva/tempid :db.part/user -2) :book/title "Designing Data-Intensive Applications" :book/year_published 2017 :book/author (eva/tempid :db.part/user -1)}

{:db/id (eva/tempid :db.part/user -3) :author/name "Aurelien Geron"}
{:db/id (eva/tempid :db.part/user -4) :book/title "Hands-On Machine Learning" :book/year_published 2017 :book/author (eva/tempid :db.part/user -3)}

{:db/id (eva/tempid :db.part/user -5) :author/name "Wil van der Aalst"}
{:db/id (eva/tempid :db.part/user -6) :book/title "Process Mining: Data Science in Action" :book/year_published 2016 :book/author (eva/tempid :db.part/user -5)}
{:db/id (eva/tempid :db.part/user -7) :book/title "Modeling Business Processes: A Petri-Net Oriented Approach" :book/year_published 2011 :book/author (eva/tempid :db.part/user -5)}

{:db/id (eva/tempid :db.part/user -8) :author/name "Edward Tufte"}
{:db/id (eva/tempid :db.part/user -9) :book/title "The Visual Display of Quantitative Information" :book/year_published 2001 :book/author (eva/tempid :db.part/user -8)}
{:db/id (eva/tempid :db.part/user -10) :book/title "Envisioning Information" :book/year_published 1990 :book/author (eva/tempid :db.part/user -8)}

{:db/id (eva/tempid :db.part/user -11) :author/name "Ramez Elmasri"}
{:db/id (eva/tempid :db.part/user -12) :book/title "Operating Systems: A Spiral Approach" :book/year_published 2009 :book/author (eva/tempid :db.part/user -11)}
{:db/id (eva/tempid :db.part/user -13) :book/title "Fundamentals of Database Systems" :book/year_published 2006 :book/author (eva/tempid :db.part/user -11)}])
(def dataset2
[{:db/id (eva/tempid :db.part/user -14) :author/name "Steve McConnell"}
{:db/id (eva/tempid :db.part/user -15) :book/title "Code Complete: A Practical Handbook of Software Construction" :book/year_published 2004 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -16) :book/title "Software Estimation: Demystifying the Black Art" :book/year_published 2006 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -17) :book/title "Rapid Development: Taming Wild Software Schedules" :book/year_published 1996 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -18) :book/title "Software Project Survival Guide" :book/year_published 1997 :book/author (eva/tempid :db.part/user -14)}
{:db/id (eva/tempid :db.part/user -19) :book/title "After the Gold Rush: Creating a True Profession of Software Engineering" :book/year_published 1999 :book/author (eva/tempid :db.part/user -14)}

{:db/id (eva/tempid :db.part/user -20)  :author/name "Don Miguel Ruiz"}
{:db/id (eva/tempid :db.part/user -21) :book/title "The Four Agreements: A Practical Guide to Personal Freedom" :book/year_published 2011 :book/author (eva/tempid :db.part/user -20)}

{:db/id (eva/tempid :db.part/user -22) :author/name "Charles Petzold"}
{:db/id (eva/tempid :db.part/user -23) :book/title "Code: The Hidden Language of Computer Hardware and Software"
:book/year_published 2000 :book/author (eva/tempid :db.part/user -22)}

{:db/id (eva/tempid :db.part/user -24) :author/name "Anil Maheshwari"}
{:db/id (eva/tempid :db.part/user -25) :book/title "Data Analytics Made Accessible" :book/year_published 2014 :book/author (eva/tempid :db.part/user -24)}

{:db/id (eva/tempid :db.part/user -26) :author/name "Jeremy Anderson"}
{:db/id (eva/tempid :db.part/user -27) :book/title "Professional Clojure" :book/year_published 2016 :book/author (eva/tempid :db.part/user -26)}])

In the above, we define two vectors (dataset and dataset2) of maps containing data that we will transact into the database. One thing you might notice is that we omitted the :db/add keyword entirely. When you are transacting a single piece of data in a list, as we did above, you are limited to adding or retracting a specific fact about an entity. When transacting data in a map, the :db/add is implied and you can include any number of attribute/value pairs. Obviously the map form method is preferable when transacting larger amounts of data.

Another thing worth mentioning is that we are now passing a negative number as the second argument to all of our eva/tempid calls. Given the same partition and negative number, each invocation will return the same temporary id. The reason we do this is so that we can use that tempid to add references between entities in a single transaction. In this example we are using the tempid for the author entity as the reference value for the :book/author attribute.

Transact this with:

@(eva/transact conn dataset)
@(eva/transact conn dataset2)

Note: The only reason we split this into dataset and dataset2 is due to a limitation in the REPL where large pastes result in odd failures.

More queries!

First we are going to demonstrate how we can parameterize our original query (def get-book '[:find ?b :where [?b :book/title "First Book"]]) so we can pass the title of the book we want to find into the query.

Modify our previous data structure with:

(def get-book '[:find ?b 
                :in $ ?t 
                :where [?b :book/title ?t]])

There are two things to note here. Firstly we've introduced the :in keyword, which provides the query with input parameters. The $ is used specifically to denote the db (passed in implicity if the :in clause is omitted). We've also added a second logic variable to replace the hard-coded book title. Now we can use this query to find the entity id of the book titled "Envisioning Information".

(eva/q get-book db "Envisioning Information")

Wait a second, that should have returned a result. Remember, db is an immutable value of the database. We modified the database since the last time we got that db value, so we need to get it again with:

(def db (eva/db conn))

Run the query again and we should get the result we desire. As much fun as entity ids are, there is a lot more we can do with the query engine so let's keep going.

Find book titles released in 2017

(eva/q '[:find ?title
         :where
         [?b :book/year_published 2017]
         [?b :book/title ?title]]
       db)

In this example, we use a logic variable ?title to find the title of books released in 1990. One interesting thing to note about this query is that ?b is used twice. When a logic variable is used more than once it must represent the same entity in every clause in order to satisfy the set of clauses. This is also referred to as unification. In SQL, this is roughly equivalent to joining the years and titles on the shared entity ids.

Find the author of a particular book

(eva/q '[:find ?name
         :where
         [?b :book/title "Designing Data-Intensive Applications"]
         [?b :book/author ?a]
         [?a :author/name ?name]] db)

In this query we are following a relationship from a book to its author, and then finding and returning the name of the author. We bind the logic variable ?a to the entity id of the author associated with the book. Then we use that entity id to get the name of the actual author.

We can just as easily reverse this query and get all of the books published by a certain author.

(eva/q '[:find ?books
         :where
         [?b :book/title ?books]
         [?b :book/author ?a]
         [?a :author/name "Steve McConnell"]] db)

How do I get a full entity?

Up to this point we have been querying data by unifying individual values after our :find clause to those in our :where clause. Using the Pull API, we can make the following call (replace eid with the result of our very first query (eva/q get-book db "Envisioning Information").

(eva/pull db '[*] <eid>)

The Pull API expects a db as its first argument, similar to how we pass db to our queries. The second argument [*] is a pattern, where you can specify which attributes you would like to have returned. Using * as your pattern indicates you want all of the attributes for this particular entity. The final argument is the id of the entity you are trying to get.

Blanks

Get all books in the database

(eva/q '[:find ?name
         :where
         [_ :book/title ?name]] db)

Instead of using a logic variable to bind the id of a book entity we simply use _. The underscore is equivalent to a wildcard and will match anything, so every entity id for this attribute will be returned by the query.

Other ":in"puts

As we mentioned before, the use of the :in clause allows us to pass inputs to our queries. A number of data types are supported, such as:

Tuples

:in $ [?a ?b] where you would pass a single tuple of logic variables to a query.

Collections

:in $ [?a ...] where you would pass a collection of inputs (ie: ["Book1" "Book2" "BookN"]) to the query.

Relations:

:in $ [[?a ?b]] where you would pass a set of tuples to the query.

What about transaction entities?

Whenever you transact data into Eva, a transaction entity is created as well. The transaction entity contains the instant (timestamp) a transaction is committed, which is useful to know in some circumstances.

Find when a book was committed to the database

(eva/q '[:find ?timestamp
         :where
         [_ :book/title "Process Mining: Data Science in Action" ?tx]
         [?tx :db/txInstant ?timestamp]] db)

It is an important realization that :db keywords can be queried the same way as user-defined schema. Another thing in the above example that may look weird is that we have four arguments in one of our datom clauses where previously we've only used three. Remember that the datom is a 5-tuple [eid attr val tx added?]. In this case ?tx binds to the tx portion of the datom, which is what we are trying to query for.

Predicates

Find book titles and their year released prior to 2005

(eva/q '[:find ?book ?year
         :where
         [?b :book/title ?book]
         [?b :book/year_published ?year]
         [(< ?year 2005)]] db)

The (< ?year 2005) clause is called a predicate, and filters the result set to only include the results which satisfy the predicate. Any Clojure function or Java method can be used as a predicate.

Find books older than "Software Project Survival Guide"

(eva/q '[:find ?book ?y1
         :where
         [?b1 :book/title ?book]
         [?b1 :book/year_published ?y1]
         [?b2 :book/title "Software Project Survival Guide"]
         [?b2 :book/year_published ?y2]
         [(< ?y1 ?y2)]] db)

Aggregates

What if we want to find the oldest or the newest book? Datalog supports a number of aggregate functions, including min, max, avg and sum.

(eva/q '[:find (min ?year)
         :where
         [_ :book/year_published ?year]] db)

Rules

Throughout this tutorial, if we wanted to the get the author for a particular book we'd need to write the same two where clauses every time. This gets old really fast, and can get quite tedious for more complicated queries. Rules offer a way to abstract away reusable components of a query.

(def rules '[[(book-author ?book ?name)
              [?b :book/title ?book]
              [?b :book/author ?a]
              [?a :author/name ?name]]])

(eva/q '[:find ?name
         :in $ %
         :where
         (book-author "Modeling Business Processes: A Petri-Net Oriented Approach" ?name)]
       db rules)

We've created a rule in this example called book-author. The part contained within the first vector (...) is called the head of the rule, containing the name and any logic variables. The rest of the rule looks exactly like the where clauses we've seen up to this point. The ?book and ?name variables can be used for both input and output. For example, if you provide a value for ?book the output will be the author of that book. Vice versa, if you provide a value for ?name you will get back the titles of all the books for that author. If you provide a value for neither ?book or ?name the query will return all the possible combinations in the database. To use a rule in a query firstly we need to pass the rule into the :in clause using the % symbol. Secondly we call the rule in one of our :where clauses like (book-author "Modeling Business Processes: A Petri-Net Oriented Approach" ?name).

Congrats

It is our hope that this tutorial has been successful in bootstrapping your basic data entry and querying abilities. Really though, this tutorial only scratches the surface. There is so much more to learn. For example, did you know that rules can call themselves? And what about getting a database snapshot from a previous point in time? Or, how do I take this knowledge and implement it in an actual app? All are good questions but fall out of the scope of this tutorial (although it is our hope we can dig into these topics at a later date).

Have questions, ideas or topics for more tutorials in the future? Create an issue and we'll do our best to respond.