-
Notifications
You must be signed in to change notification settings - Fork 66
Improve Jobs API concurrency #2220
Comments
From @fridex on February 2, 2018 10:31
BTW see https://github.com/fabric8-analytics/fabric8-analytics-jobs/blob/master/f8a-jobs.py#L109-L112 |
From @tuxdna on February 2, 2018 10:35 There are two options:
As @fridex mentioned above we also have to ensure that there is only one Jobs scheduler running in either case. Interesting mix of constraints here :-) |
Having scheduler running as part of the server process seems to be the limiting factor here. I think it would be much better to separate the two components (API and scheduler). This hidden scheduler is an easy way how to shoot ourselves in the foot. |
With server='flask'
With server='gevent'
|
In the comment above I switched between Next thing I am trying out is using a per-container Lockfile, that ensures exactly one process / thread has an active Scheduler. |
@tuxdna I realized that we have a persistent volume attached to the service at the moment. That's something we will need to tackle separately, but unfortunately it means that even if we find a workaround for the scheduler here, we still won't be able to easily scale up (increase number of replicas) the service. |
@msrb That's correct. The root cause for not being able to increase the replicas is the way containers are started and stopped. At the startup we can definitely acquire a lock ( this lock could be an arbitrary mechanism and not just files ), but at the shutdown ( the container died for example ), we can not guarantee a release of that lock. i.e.
Another approach I am thinking is to completely separate out the scheduler service. Because scheduler and the APIs are coupled together into one service, that is causing this entanglement. I do believe this is the right thing to do -- separate them both. |
@tuxdna yeah, decoupling scheduler seems to be the best approach. But that alone will not allow us to increase number of replicas, because the persistent volume can only be attached to a single pod. That's out of scope for this issue though. |
We will not focus on this now. Scaling up this service means rewriting significant portion of it. |
Jobs API is now deprecated so this issues is irrelevant. |
From @tuxdna on February 2, 2018 10:27
We are currently encountering Gateway Timeouts when the Jobs API calls take very long time to process. This is already a known issue fabric8-analytics/fabric8-analytics-jobs#164
Another issue that I realized is that the number of API workers are at present set to only 1. This causes other clients to wait until a request completes or fails. We can improve the concurrency of API workers to resolve this.
Copied from original issue: fabric8-analytics/fabric8-analytics-jobs#247
The text was updated successfully, but these errors were encountered: