Split Titan and Gremlin - impossible to scale #10

fridex · 2017-06-14T08:09:25Z

As the current stack looks like the following:

DynamoDB - Titan - Gremlin

We would like to split Titan and Gremlin into two standalone containers which would allow us to scale Gremlin independently. Note that creating multiple Titan instances talking to the same DynamoDB tables results in data inconsistencies and data corruptions as stated in "Titan Limitations":

Running multiple Titan instances on one machine backed by the same storage backend (distributed or local) requires that each of these instances has a unique configuration for storage.machine-id-appendix. Otherwise, these instances might overwrite each other leading to data corruption. See Graph Configuration for more information.

Source: http://titan.thinkaurelius.com/wikidoc/0.3.1/Titan-Limitations.html

tuxdna · 2017-06-14T08:56:47Z

Totally agree with having only one TitanGraph instance.

msrb · 2017-06-14T09:12:31Z

My understanding (may be completely wrong) is that Gremlin is just a thin wrapper sitting in front of Titan. So, ideally, we would like to be able to scale (also) Titan, as titan is the component that does all the heavy lifting.

Can we scale Titan as well?

fridex · 2017-06-14T10:51:27Z

My understanding (may be completely wrong) is that Gremlin is just a thin wrapper sitting in front of Titan. So, ideally, we would like to be able to scale (also) Titan, as titan is the component that does all the heavy lifting.

If I understand the overall architecture correctly, gremlin also performs query strategy evaluation and path traversal computations to optimize queries. So we would probably save something if we scale Gremlin (besides possible connection reuse?).

Can we scale Titan as well?

I think this will not be doable - based on the configuration section, it is not possible to let two or more Titans talk to the same graph - see storage.machine-id-appendix [1] as stated above in:

http://titan.thinkaurelius.com/wikidoc/0.3.1/Graph-Configuration.html

krishnapaparaju · 2017-06-14T13:35:15Z

Do we have current issues documented (for ingestion onto Gremlin layer.. ) ?

krishnapaparaju · 2017-06-14T13:38:54Z

What kind of issues we getting into when more workers / threads / containers run in parallel and ingest onto Gremlin ? Any plan action can be based on this.

fridex · 2017-06-14T13:52:49Z

What kind of issues we getting into when more workers / threads / containers run in parallel and ingest onto Gremlin ?

We wanted to scale Gremlin (now we can scale only Gremlin+Titan as it is one container) when we were on heavy load. It took quiet a while to store/retrieve data from the graph database. I will need to check communication parameters or find a bottleneck there. Also on UI level there are done retries as it takes time to get data.

Currently the main issue is the data_model importer which is an API that simply pushes data into the graph. There was a plan to remove this API service (save one container in deployment) and directly let workers to communicate with Gremlin.

From my perspective there could be a nice opportunity to write a really small library that would help us with gremlin communication, serializing query results and constructing queries (it could directly use schemas we maintain for task results) - it could be used on worker side and on API side to utilize work with the graph. This will also address other concerns we currently have.

miteshvp · 2017-06-15T10:58:41Z

Running multiple Titan instances on one machine backed by the same storage backend (distributed or local) requires that each of these instances has a unique configuration for storage.machine-id-appendix. Otherwise, these instances might overwrite each other leading to data corruption. See Graph Configuration for more information.

Source: http://titan.thinkaurelius.com/wikidoc/0.3.1/Titan-Limitations.html

We are using Titan 1.0.0. This document relates to pretty old version of Titan.

krishnapaparaju · 2017-06-15T14:00:18Z

Which mode are we using in production for Titan ? multiple/ single item model ?

tuxdna · 2017-06-15T15:19:21Z

@krishnapaparaju Do you mean the SINGLE vs MULTI as documented here in the document below:?

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.TitanDB.BestPractices.html

As per my understanding of the DynamoDB Titan plugin, gremlin is using MULTI by default.

miteshvp · 2017-06-16T05:17:16Z

Which mode are we using in production for Titan ? multiple/ single item model ?

@krishnapaparaju - yes we are using multi item model.

krishnapaparaju · 2017-06-17T17:02:45Z

Thanks @miteshvp @tuxdna .. I am sure MULTI would have issues on write side. Any side affects we get into if we switch to SINGLE ? If not I would start with (a) change to SINGLE model (b) Observe details for WRITE-CAPACITY-UNITS-USED at AWS.

krishnapaparaju · 2017-06-17T17:05:38Z

Once these issues are resolved, we might want to move to JanusGraph (again with AWS plugin) , this would protect our prior investments both at Gremlin (code) and DynamoDB (deployment). Don't think we need to be doing anything related to JanusGraph anytime soon, this is more of FYI for moving to a DB which is actively maintained and very minimal rework.

https://github.com/awslabs/dynamodb-janusgraph-storage-backend

miteshvp · 2017-06-19T05:34:33Z

Any side affects we get into if we switch to SINGLE ?

This means converting entire graph to write in SINGLE mode from scratch.
A limit of 400kb per node, which is still ok for our nodes.
Graphs with low vertex degree and low number of items per index can take advantage of this implementation. We have rather a big skew towards less indices but more items.
We will definitely save time on writes (initial graph loads), but with previous experimentation for reads, there was hardly any difference to fetch. That was one of the reasons for not going ahead and re-writing entire graph in SINGLE mode.

cc @krishnapaparaju

krishnapaparaju · 2017-06-19T05:44:55Z

@miteshvp thanks. Can start with experimenting with SINGLE model (on a different instance) to check if we get better on WRITE side. Good to know that changing to SINGLE would not lead to any READ side challenges. Based on these results from experiment, can plan course of action.

miteshvp · 2017-06-29T05:19:54Z

closing this issue w.r.t #11

fridex mentioned this issue Jun 14, 2017

Resulting image is huge #9

Open

fridex changed the title ~~Split Titan and Gremlin~~ Split Titan and Gremlin - impossible to scale Jun 14, 2017

miteshvp closed this as completed Jun 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Titan and Gremlin - impossible to scale #10

Split Titan and Gremlin - impossible to scale #10

fridex commented Jun 14, 2017

tuxdna commented Jun 14, 2017

msrb commented Jun 14, 2017

fridex commented Jun 14, 2017

krishnapaparaju commented Jun 14, 2017

krishnapaparaju commented Jun 14, 2017

fridex commented Jun 14, 2017

miteshvp commented Jun 15, 2017

krishnapaparaju commented Jun 15, 2017

tuxdna commented Jun 15, 2017

miteshvp commented Jun 16, 2017 •

edited

Loading

krishnapaparaju commented Jun 17, 2017

krishnapaparaju commented Jun 17, 2017

miteshvp commented Jun 19, 2017 •

edited

Loading

krishnapaparaju commented Jun 19, 2017

miteshvp commented Jun 29, 2017

Split Titan and Gremlin - impossible to scale #10

Split Titan and Gremlin - impossible to scale #10

Comments

fridex commented Jun 14, 2017

tuxdna commented Jun 14, 2017

msrb commented Jun 14, 2017

fridex commented Jun 14, 2017

krishnapaparaju commented Jun 14, 2017

krishnapaparaju commented Jun 14, 2017

fridex commented Jun 14, 2017

miteshvp commented Jun 15, 2017

krishnapaparaju commented Jun 15, 2017

tuxdna commented Jun 15, 2017

miteshvp commented Jun 16, 2017 • edited Loading

krishnapaparaju commented Jun 17, 2017

krishnapaparaju commented Jun 17, 2017

miteshvp commented Jun 19, 2017 • edited Loading

krishnapaparaju commented Jun 19, 2017

miteshvp commented Jun 29, 2017

miteshvp commented Jun 16, 2017 •

edited

Loading

miteshvp commented Jun 19, 2017 •

edited

Loading