Graph Database options

Hosted Services

VM-based solutions

While there are no right or wrong answers around this, here are a few popular graph database engines to consider, and why you might consider them.

###Neo4j

Neo4j provides a free Community and for-pay Enterprise edition. The Community edition has a free deployment already on Azure Marketplace, so testing it out should be easy. The Enterprise edition is suitable for running large-scale graphs.

Deployments are limited to a single machine.
- Multi-machine deployments are used only for HA clustering.
- PB-scale data would require partitionable graphs and a custom partitioning strategy/implementation.
Cypher query language makes it easy to operate on the graphs.
- Server-side marshalling means queries are efficient.
- Custom query language means vendor lock-in.

###OrientDB

OrientDB is a similar to Neo4j with a free Community edition and a for-pay Enterprise edition. One minor difference is that the free edition is fully Open-Source under the Apache2 license (technically, Neo4j's Community edition is OSS, but under GPL so typically not enterprise-friendly). The Community edition has a free deployment already on Azure Marketplace, so testing it should be easy.

Supports storage of full "JSON" documents (akin to e.g. Mongo or DocDB).
Sharded, multi-master support means support for graphs of much larger size than Neo4j.
Query syntax is, literally, SQL.
- Minor extensions to the language to support graph concepts.
Supports Write-ahead Logging, coupled with ACID transactions, allowing better fault-tolerance than many NoSQL stores.
Community edition is not "nerfed" - supports HA deployments, no limits on scale.

###Titan

Titan is an Open-Source, Apache2-licensed Graph DB, with a unique slant. It has a couple of features to recommend it, but is still relatively new and has just been acquired by DataStax - this likely means stronger long-term support, but may limit support for non-DataStax storage backbone support. There is no Azure Marketplace build for Titan, so deploying it will require more work. A simple testing instance is easy to deploy to an Azure Linux VM, though, but may not give you any indication of "real world" deployments.

Titan is a Graph DB "layer" over a canonical store.
- Out-of-the-box support for Cassandra, HBase, or BDBs.
- Fault-tolerance and CAP-related guarantees depend on the underlying store.
- Transaction layer provides flexible guarantees, from strong ACID to eventual consistency.
- Much higher scalability than single-node Graph DBs like Neo4j since persistence layer scales independently.
Query layer is built on standard TinkerPop stack.
- Gremlin query language has a more procedure-oriented format, but is standard and provides good fan-out capabilities.
- Blueprints provides a consistent REST API for all supporting DBs.
- TinkerPop is not yet widely adopted, so while in theory it provides shelter from vendor lock-in, in practice that guarantee is less secure. Neo4j and OrientDB both claim compliance.

##Picking a Graph Store

There are no hard-and-fast rules to picking one of these over the other, but a few questions you should ask to guide yourself in the right direction.

Are you migrating from an existing store? Is it Key/Value or Document-based?

If you're migrating from an existing Document-based store, OrientDB has the benefit of allowing you to keep your existing document schema while moving it into the new data store. If you have an existing K/V store, moving to any Graph DB should be possible, or you can consider a hybrid solution as outlined in the main GraphDB document.
Do you require Enterprise-level support?

In this case, you should consider Neo4j or OrientDB, as both have a built-in Enterprise edition and support model. Titan has Enterprise-level support, and that is even stronger now with the purchase by DataStax, but they don't have a focused Enterprise-edition so are lacking Enterprise-level tooling in some areas.
Is your data "big data"?

Are your graphs Petabytes in size, with hundreds of millions of nodes and edges? If so, you should consider OrientDB or Titan - both of which can scale horizontally, in different ways.
Is your workload write-heavy? Will this be your primary store?

If this is the case, OrientDB is a solid choice due to write-ahead logs and ACID support as well as multi-master sharded deployments.
How willing are you to trade performance for fault-tolerance?

In some models (batch-loaded, high-read/low-write, BI applications), you might favor a design suited to high performance queries where fault-tolerance is not as important. Titan backed by Berkeley DBs can be a high-performance solution in these cases. However, before optimizing for performance, it makes sense to benchmark on your own scenarios - Neo4j and OrientDB also perform quite well on big iron servers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graphdb-options.md

graphdb-options.md

Graph Database options

Hosted Services

VM-based solutions

Files

graphdb-options.md

Latest commit

History

graphdb-options.md

File metadata and controls

Graph Database options

Hosted Services

VM-based solutions