Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of lumi sections per block in WMAgent #10264

Open
amaltaro opened this issue Feb 4, 2021 · 4 comments
Open

Limit number of lumi sections per block in WMAgent #10264

amaltaro opened this issue Feb 4, 2021 · 4 comments

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Feb 4, 2021

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
As discussed in this issue:
#10237

there could be times that we create blocks with millions of lumi sections. Which makes the block injection a very expensive process to the overall infrastructure (agent, cmsweb frontends and DBS backends).

Currently, there is a limit at the global workqueue level which defines MC GQEs not to have more than 400k lumis:
https://github.com/dmwm/deployment/blob/master/workqueue/config.py#L33

but, since the block above had almost 2 million, I suspect merge jobs can cross the boundary of workqueue elements, thus just considering the workflow + task + outputmodule + datatier and site where unmerged files are available.

Describe the solution you'd like
If the ~1.8M lumi sections required around 120MB of data to be transferred from the agent to the DBSServer, we should likely decrease it by half or 2/3 and set that as - a configurable - limit of number of lumi sections to be allowed to go inside the same DBS block.

So, we need to implement this logic in DBS3Upload such that it does not allow a block to go beyond a given # of lumis (default to 600k), of course, unless that file itself already has more than 600k lumis. Otherwise, a new block should be created for every time we hit the previous open block limit.

Describe alternatives you've considered
Alternative is not to make anything and keep fighting these blocks on demand, a handful of times a year.

Additional context
none

@amaltaro
Copy link
Contributor Author

Further information provided in: #11057

@amaltaro
Copy link
Contributor Author

amaltaro commented Oct 5, 2022

Yet another problem reported here: #11316
this time with a payload (from json.dumps) of almost 500MB!

Raising the priority of this issue and moving it under the Q4/2022 board.

@vkuznet
Copy link
Contributor

vkuznet commented Oct 7, 2022

@amaltaro , could you please provide details on how to address this issue. In particular

  • which code is responsible for making block decision, e.g. the DBS3Upload.py comes from WMComponent/DBS3Buffer module, where actual work seems to be done in DBSUploadPoller.py module
  • if I understand the logic correct when block is created it already has files and therefore lumis in it, therefore your statement it does not allow a block to go beyond a given # of lumis (default to 600k), of course, unless that file itself already has more than 600k lumis. should be clarified as following:
    • where block creation is done
    • who is putting files and how into the block
    • who control number of lumis in a file
    • what to do if file has already created with large number of lumis
    • what to do with block if it already got more lumis than the limit, i.e. should be deleted (then how?), should it be recreated (then how?), etc.

I think that in order to work on this issue an expert should provide all relevant details on block/file creation, along with pointers to the relevant codebase and clarification about the logic how to deal with large lumis content in a file/block.

@amaltaro
Copy link
Contributor Author

As reported in the DBS issue linked above (and here dmwm/dbs2go#84), we should target a limit of 1M lumis per block. If possible, .5M would be even better and keep a lower response time of such DBS API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants