-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Graph Sync #11
Comments
There was already an implementation that used asynchronous HTTP requests from workers but I insist to change it. One of disadvantages of such implementation are, as you stated, lose of log control and all the log-related things. Making request without making sure that we have actually succeeded could lead to really hard to debug/hard to track issues. Another thing is that these async requests are usually done in separate threads that could lead to thread scheduling dependent evaluation. If we just issue async request and immediately return from call, this could simply lead to:
And not to forget... even if you do async requests, you can easily perform DoS to remote API server with such approach. I still think that we should get rid of data importer as it really does not make sense to have it and create a standalone Selinon task that offers us transparent horizontal scaling. |
Yes, this is possible and we have already pending patch for that in fabric8-analytics/fabric8-analytics-worker#63 EDIT: also note that this does not solve this issue as it occurred many times that all cluster nodes were stucked on graph imports as they occupied all available workers and we were not able to analyse anything. |
One option is to we could track these errors at data_importer layer rather than worker processes.
Again we could handle this as part of unknown scenario.
Any API server for that matter goes through this challenge. We might as well encounter on our core API server. |
Yup, that is what I meant initially, whether is it possible to unblock other workers. Thanks for clarifying. So we might have to remove that as an option |
@fridex - what about invoking data_importer in multi-worker mode? Do you think it might be helpful to perform a small load test? |
How do you want to track these issues when you don't even know that the connection was actually established?
How do you want to handle application related issues on transport layer?
I don't think this will help - anyway we are scaling vertically not horizontally. I created PRs this morning that separate worker for graph imports so other workers can continue with analyses (now we can scale worker that does ingestion independently). I still think we should remove data importer completely and place this logic to core worker tasks. |
Should be handled as part of unknown scenario. But we may not know what went wrong.
This is about short term solution, easy-fix and faster turn-arounds.
I am not saying we don't need this, but even if its a part of core worker, we still rely on gremlin-http as a last-mile insertion to graph. This is where we could keep track of all the comments, reasons for future references. |
We are talking about TCP (transport layer of OSI/ISO model) and application level logic. How does unknown scenario - unknown package to our analyses relate to this?
I think we introduced a lot of "short--term" solutions. We should start doing things properly,
Yes, exactly. So we don't have bottleneck the API server that is written by us, but services that were designed to deal with heavy load and large data sets. I suppose Gremlin was chosen because of this. |
Some Performance Benchmark results are available at https://docs.google.com/spreadsheets/d/1ojvQwhWzpxKBF77X8EGukRi2O0f2dPD_g0rFivZp8To/edit#gid=0 I have also raised an issue for SINGLE item model read/write resulting in high number of storage exceptions. |
This issue collates all the points that could be helpful in improving graph writes. There are three ways to solve the issue.
cc @msrb @krishnapaparaju @samuzzal-choudhury
The text was updated successfully, but these errors were encountered: