-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit number of lumi sections per block in WMAgent #10264
Comments
Further information provided in: #11057 |
Yet another problem reported here: #11316 Raising the priority of this issue and moving it under the Q4/2022 board. |
@amaltaro , could you please provide details on how to address this issue. In particular
I think that in order to work on this issue an expert should provide all relevant details on block/file creation, along with pointers to the relevant codebase and clarification about the logic how to deal with large lumis content in a file/block. |
As reported in the DBS issue linked above (and here dmwm/dbs2go#84), we should target a limit of 1M lumis per block. If possible, .5M would be even better and keep a lower response time of such DBS API. |
Impact of the new feature
WMAgent
Is your feature request related to a problem? Please describe.
As discussed in this issue:
#10237
there could be times that we create blocks with millions of lumi sections. Which makes the block injection a very expensive process to the overall infrastructure (agent, cmsweb frontends and DBS backends).
Currently, there is a limit at the global workqueue level which defines MC GQEs not to have more than 400k lumis:
https://github.com/dmwm/deployment/blob/master/workqueue/config.py#L33
but, since the block above had almost 2 million, I suspect merge jobs can cross the boundary of workqueue elements, thus just considering the workflow + task + outputmodule + datatier and site where unmerged files are available.
Describe the solution you'd like
If the ~1.8M lumi sections required around 120MB of data to be transferred from the agent to the DBSServer, we should likely decrease it by half or 2/3 and set that as - a configurable - limit of number of lumi sections to be allowed to go inside the same DBS block.
So, we need to implement this logic in DBS3Upload such that it does not allow a block to go beyond a given # of lumis (default to 600k), of course, unless that file itself already has more than 600k lumis. Otherwise, a new block should be created for every time we hit the previous open block limit.
Describe alternatives you've considered
Alternative is not to make anything and keep fighting these blocks on demand, a handful of times a year.
Additional context
none
The text was updated successfully, but these errors were encountered: