Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMSSW 15 won't support microarch-v2 #12168

Open
mmascher opened this issue Nov 15, 2024 · 81 comments
Open

CMSSW 15 won't support microarch-v2 #12168

mmascher opened this issue Nov 15, 2024 · 81 comments
Assignees

Comments

@mmascher
Copy link
Member

Impact of the new feature
It was decided at the O&C week that starting from CMSSW 15 we won't build CMSSW for microarch v2. v1 has already been excluded and is not supported anymore. We need to make sure that jobs that will have a requirement for microarch x86_64-v[34] when they need to run on CMSSW15.

Additional context
Microarch can be found on the slots as:

[mmascher@vocms4100 ~]$ condor_status -af Microarch | sort | uniq -c
     15 x86_64-v1
  12810 x86_64-v2
 113836 x86_64-v3
  38921 x86_64-v4
@mmascher
Copy link
Member Author

mmascher commented Nov 15, 2024

Adding an email from antonio about this:

Hi Phat,

Thanks, we discussed this in the SI meeting today. Marco will open an issue, where we have to discuss basically 2 items:

  • how (and who) should be doing the mapping between CMSSW versions and desired micro arch? In my opinion, this should happen in WM, which translates the request constraints into actual resource requirements.

  • where should we implement the matchmaking constraint for the micro arch, as general condition on the pilots, or rather as additional requirements for the jobs? We discussed it could be done in WM, if this is exclusive to production jobs, or rather insert it as a general condition in the slot matching requirements, if we expect this to somehow also affect analysis jobs in the future.

Cheers,
Antonio

@srimanob
Copy link

Thanks @mmascher

More discussion is ongoing as sites show concern if their people can't analyze Run 3 data with newer CMSSW. The current proposal is at the end of the comment. (2) is on WM+SI part

==========

(1) Building CMSW with Dual Version Compatibility (v2 + v3):

  • We continue to build CMSW with both v2 and v3, allowing cmssw/scram to select the version that best matches the available CPU architecture.
  • This approach enables analyzers using resources that only support v2 to still access and analyze data with the latest CMSSW versions.
  • All external dependencies are built with v2, maintaining compatibility for systems limited to this architecture. The situation will be the same as HLT at this moment.

(2) Central production:

  • By implementing a matching condition, jobs submitted with CMSSW_15_0_0 (and later) are directed to sites that support v3-only CPUs.
  • Sites restricted to v2 CPUs will continue to receive jobs for Run-2 and Run-3 campaigns (2022, 2023, 2024) as long as we maintain the current campaign setup. This ensures that all sites remain actively engaged in ongoing data processing without requiring immediate hardware upgrades.

(3) Validation:

  • Starting with CMSSW_15_0_0_pre1, we switch from v2 to v3 for validation and maintain validated v3 versions moving forward.
  • The High-Level Trigger (HLT) will benefit from this by not needing to perform separate validation for v3, simplifying the validation process and reducing redundancy.

@smuzaffar
Copy link

Hello, as @srimanob mentioned, we discussed this in detail during yesterday's core sw meeting and, as @srimanob mentioned, CMSSW_15_0_X will support dual micro-arch for x86_64 with x86-64-v3 being the default micro-architecture.

How can we make sure that Central Production/RelVals use x86-64-v3 compatible resources for 15.0.X and above release cycle? In future, SCRAM can set an environment variable to expose default micro-arch of a release so that Central Production/RelVals can use that in the job requirement.

@belforte
Copy link
Member

@mmascher (there's no hook for Antonio here). Please link to the discussion issue in SI mentioned in the mail from Antonio

Of course support is needed for analysis jobs as well.

Now we multiple threads on this same topic (this, the one in CRABClient mentioned above, the SI one, discussion in CMSSW https://indico.cern.ch/event/1482012/ ). I'd rather like a single document where people comment and which quickly converge to the specification for the solution.
Can SW-Core do that ? @smuzaffar

@smuzaffar
Copy link

@belforte , why not just use this issue as a single doc then :-)

About "mapping between CMSSW versions and desired micro arch", I think it will be easier if SCRAM set this via an environment variable. This way you do not need to hardcode this mapping in WM. e.g in CMSSW_15_0_X scram can set CMSSW_DEFAULT_MICRO_ARCH=x86-64-v3 and Relvals/Central production use it to request resources. For Central production/Relvals we prefer to use default microarch.

For analysis jobs, where some users might want to use v2 resources, SCRAM can set CMSSW_MICRO_ARCHS env to comma separated list of micro-archs and crab can make use of that to request the correct resources.

In case these env variable are not set ( e.g for CMSSW_14_2 and earlier) then just do what we are doing now.

FYI @makortel

@belforte
Copy link
Member

Assuming that CMSSW_MICRO_ARCHS is set after cmsenv, adding a classAd MY.DESIRED_MICROARCH="x86-64-v2,x86-64-v3,..." is easy.
And I presume that's what SI will give us. I do not want to make site lists based on their hardware. Also some sites may have a mix, right ? @mmascher do you agree ?

I worry about the overall picture. Of course we should try to make it fully transparent for as many users as possible. Hopefully only those who develop for CMSSW_15 on v2 machines will have to do something. So I have to worry about what other changes to do in CRAB besides the above.

my questions are more like

  • does this affect where DM (and CRAB tape recall) place datasets ?
  • are those v2 machines grouped at specific sites or scattered all over ?
  • are there sites which are mostly/only v2 ?
  • what about jobs which run at sites other then where the data is ?
  • will it matter which release was a dataset produced with ? Should we at least make sure that all data produced with CMSSW_15+ land at sites with e.g. >50% v3 machines ?

@makortel
Copy link

One question to WMCore (say @amaltaro @anpicci) came up in the core software meeting yesterday: does WMCore (production, not CRAB) run cmsRun from a release area, or does it create a developer area (cmsrel CMSSW_X_Y_Z / scram project CMSSW_X_Y_Z) first?

@makortel
Copy link

Another question to PdmV (@AdrianoDee @DickyChant @miquork): on what sites are the RelVals being run nowadays?

In yesterday's core software meeting we had a feeling that the solution on WMCore side to limit production (including RelVal) jobs to x86-64-v3 might not be in place by CMSSW_15_0_0_pre1 (scheduled to be built on December 10), where core software is presently planning to switch the default from v2 to v3 (and where this change would be validated). We assumed PdmV would be able to add the necessary resource matchmaking criteria at the submission time, but then wondered if already the set of sites (e.g. FNAL?) used for RelVals would, in practice, guarantee x86-64-v3-only hardware.

@AdrianoDee
Copy link

@makortel currently only at FNAL. We could re-include CERN now that the data taking has ended. But we have no urgency or specific need to do so (FNAL resources are more than enough).

@belforte
Copy link
Member

stupid Q. What about v4 ? Will v3-built exe run there ?

@smuzaffar
Copy link

stupid Q. What about v4 ? Will v3-built exe run there ?

yes. Any exe build for vN should be able to run on vN+x ( where x>=0). So v2 exec should be able to run on v2, v3 and v4

@makortel
Copy link

stupid Q. What about v4 ? Will v3-built exe run there ?

yes. Any exe build for vN should be able to run on vN+x ( where x>=0). So v2 exec should be able to run on v2, v3 and v4

Yeah, today we run v2 binaries on all v2, v3, and v4 hardware.

@belforte
Copy link
Member

I know that what Matti asked was not a question meant for CRAB. But since CRAB wrapper set ups the env. for cmsRun with same tools as WMA does, I looked at the code and confirmed my observation that a developer area is created and then the environment settings from scramv1 run -sh are copied into the wrapper memory and passed to subprocesses as needed. Of course Alan can still correct me if I got it wrong.

@makortel
Copy link

Thanks @belforte. So both WMA and CRAB behave similarly in this regard.

@smuzaffar Reading your slide 5 again from yesterday, do I understand correctly that in the case that a job lands on a v2-only node and sets up the developer area, that developer area would set up the full multi-arch behavior including the selection of best microarchitecture (v2 in this case) by scram?

@smuzaffar
Copy link

smuzaffar commented Nov 28, 2024

Yes @makortel , (though not implemented yet but) idea is that if one creates a dev area on v2-only node then scram should automatically

  • enable multi-microarch support when scram project is called
  • set env for best microarch and in this case it should be v2

All cmssw jobs should work fine as long as any part (shared libs/plugins) of cmssw was not (re)build in dev area before submitting the job. So if Central Production/Relval jobs just create cmssw dev area and run cmsDriver/cmsRun then it should work fine (regardless of the micro-arch of the node where it runs).

Problem I see is for crab jobs where user do checkout part of cmssw and build and submit. If someone submits a job with v3 only builds (default microarch) and it then lands on v2 node then all the user libs will be ignored.

@mapellidario
Copy link
Member

Stefano's conclusions are correct

confirmed my observation that a developer area is created and then the environment settings from scramv1 run -sh are copied into the wrapper memory and passed to subprocesses as needed.

However, the technical details are slightly different. WMCore runs cmssw from WMCore/WMSpec/Steps/Executors/CMSSW.py, and uses Scram.py for the pset tweak and a bash script inside a popen for cmsRun. An example of the arguments is [1].

Long story short, wmcore creates a developer area

$SCRAM_COMMAND project $SCRAM_PROJECT $CMSSW_VERSION

then loads into the env the output of scramv1 runtime -sh

eval `$SCRAM_COMMAND runtime -sh`

then executes cmsRun

$EXECUTABLE -j $JOB_REPORT $CONFIGURATION 2>&1 &


[1]

ref: https://cmsweb.cern.ch/reqmgr2/config?name=cmsunified_task_BTV-Run3Summer22NanoAODv12-00044__v1_T_241118_130440_5190

.steps.cmsRun1.application.section_('controls')
.steps.cmsRun1.application.section_('multicore')
.steps.cmsRun1.application.multicore.eventStreams = 0
.steps.cmsRun1.application.multicore.numberOfCores = 4
.steps.cmsRun1.application.section_('configuration')
.steps.cmsRun1.application.configuration.cacheName = 'reqmgr_config_cache'
.steps.cmsRun1.application.configuration.configId = 'fe55d6061c26d6e60fd635da09b033a0'
.steps.cmsRun1.application.configuration.retrieveConfigUrl = 'https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/fe55d6061c26d6e60fd635da09b033a0/configFile'
.steps.cmsRun1.application.configuration.pickledarguments = b'(dp0\nVglobalTag\np1\nV132X_mcRun3_2022_realistic_v3\np2\ns.'
.steps.cmsRun1.application.configuration.configCacheUrl = 'https://cmsweb.cern.ch/couchdb'
.steps.cmsRun1.application.configuration.section_('arguments')
.steps.cmsRun1.application.configuration.arguments.globalTag = '132X_mcRun3_2022_realistic_v3'
.steps.cmsRun1.application.configuration.configUrl = 'https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/fe55d6061c26d6e60fd635da09b033a0'
.steps.cmsRun1.application.section_('setup')
.steps.cmsRun1.application.setup.scramCommand = 'scramv1'
.steps.cmsRun1.application.setup.cmsswVersion = 'CMSSW_13_2_6_patch2'
.steps.cmsRun1.application.setup.scramProject = 'CMSSW'
.steps.cmsRun1.application.setup.softwareEnvironment = ''
.steps.cmsRun1.application.setup.buildArch = None
.steps.cmsRun1.application.setup.scramArch = ['el8_amd64_gcc11']
.steps.cmsRun1.application.section_('gpu')
.steps.cmsRun1.application.gpu.gpuRequirements = None
.steps.cmsRun1.application.gpu.gpuRequired = 'forbidden'
.steps.cmsRun1.application.section_('command')
.steps.cmsRun1.application.command.configurationHash = None
.steps.cmsRun1.application.command.executable = 'cmsRun'
.steps.cmsRun1.application.command.arguments = ''
.steps.cmsRun1.application.command.configuration = 'PSet.py'
.steps.cmsRun1.application.command.psetTweak = None
.steps.cmsRun1.application.command.configurationPickle = 'PSet.pkl'

@anpicci
Copy link
Contributor

anpicci commented Nov 28, 2024

@makortel FYI, Dario provided a reply for your question targeting WMCore

@anpicci
Copy link
Contributor

anpicci commented Nov 28, 2024

By the way, any development from WMCore on this topic will likely materialize in the next quarter, since IMU we still lack a common consensus about where to implement the binding to the specific microarchitecture, and our effort is committed to finalizing Q3 dev issues and urgent operation activities. According to the discussion here, I would support @belforte's proposal, and I am not convinced WM is the right place to introduce this binding.

@belforte
Copy link
Member

Thanks @smuzaffar for your summary. Let me rephrase from our (WM) POV to make sure I understood:

  • SI will be in charge of running jobs on machines which match the microarch version specified (or a later one)
  • WMA will need to request v3 for CMSSW_15 and later.
  • CRAB will need to figure out (from SCRAM_MICRO_ARCHITECTURE) where the user application can run and ask for the proper microarch.
  • We still do not know where those v2-only resources are and if there are full v2-only sites etc. so can't think how to deal with microarch-aware data placement or microarch-aware data access over WAN (a new twist to "overflow" ?)

@mmascher
Copy link
Member Author

Hello all,

Just for my understanding, is this a use case that will happen very often, or is it something that, in general, is going to change rarely? I see WMAgent and CRAB are defining respecively:

CMSSW_Versions = "CMSSW_14_2_0_pre4"
CRAB_JobSW = "CMSSW_9_2_6"

It's very easy to add a job's requirement like (for example):

ifthenelse(regexp("$CMSSW_1[56789].*", CMSSW_Versions), Microarch=="x86_64-v3" || Microarch=="x86_64-v4", True)

We can possibly turn this into a "table" with CMSSW version and list of supported achitecture with some nested if.

I would still provision machines solely based on DESIRED_Sites, otherwise Factory ops will need to check all the worker nodes at all sites, and update the static factory xml configuration file with the information about sites.

@smuzaffar
Copy link

thanks @mmascher , your suggestion will work if we only allow to use the default micro-arch of a release. So yes for CMSSW_15_0 and above it can always request resources with x86_64-v3 (I think something like Microarch>="x86_64-v3")` should also work.

But we also need to allow running CMSSW_15_0 on Microarch>="x86_64-v2" if explicitly requested by user. As @belforte mentioned above , support for something like MY.DESIRED_MICROARCH=x86_64-v2 which then can be converted to Microarch>="x86_64-v2" should allow us to use v2 resources too

@belforte
Copy link
Member

belforte commented Nov 28, 2024

yes @mmascher , as @smuzaffar said CRAB situation is trickier.
Maybe we can have a CMSSW-version dependent default but if MY.DESIRED_MICROARCH is present it needs to be respected. It think that I can avoid to specify a list, but only indicate the minimum required value for microarch.

I hope too that we can avoid using the microarch in the provisioning, but if some jobs have a requirement for v2 and v2 is a scarce resource, it may not work. See GPU e.g. We need to have some understanding of where those v2's are to define policies that will work for us. I'd rather not try to handle this "a priori".
Does anybody have a clue about where those v2 machines are ? A small fraction here and there will not matter, whole sites are a different story.

@smuzaffar
Copy link

I can avoid to specify a list, but only indicate the minimum required value for microarch

@belforte , I was thinking the same and I think scram env variable can help you there. Instead of setting a comma separated list , scram can just set the min require micro-arch

@smuzaffar
Copy link

smuzaffar commented Dec 2, 2024

@mmascher , I see this in the Job.submit file

# These attributes help gWMS decide what platforms this job can run on; see https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompOpsMatchArchitecture
+REQUIRED_ARCH = "X86_64"
+DESIRED_CMSDataset = undefined

So how hard it is to add support for +DESIRED_MICROARCH which can then be converted to Microarch>="${DESIRED_MICROARCH}" ?

We want to enable multi-arch with x86-64-v3 as default for 15.0.0.pre1 (which should be built on 12th Dec). If this can not be implemented before 15.0.0.pre1 then I am fine with the CMSSW version to microarch mapping

ifthenelse(regexp("^CMSSW_(2[0-9]+|1[5-9]+)_.*", CMSSW_Versions), Microarch>="x86_64-v3"", True)

At least this will allow us to validate 15.0.0.pre1 for x86-64-v3

@mmascher
Copy link
Member Author

mmascher commented Dec 2, 2024

@belforte , we can do do something similar to ARCH, i.e., se in the job ad:

+REQUIRED_MICROARCH = "any" #default value if not set in the jobad

or

+REQUIRED_MICROARCH = "x86_64-v3,x86_64-v4" #job selects specific list of micro architectures

Then add the necessary matchmaking expressions in the machine start expression.

@smuzaffar
Copy link

smuzaffar commented Dec 2, 2024

thanks @mmascher , this will help.
By the way, do you really need comma separated list of microarch i.e. +REQUIRED_MICROARCH = "x86_64-v3,x86_64-v4" for matchmaking? This means for cmssw x86-64-v2 one has to set +REQUIRED_MICROARCH = "x86-64-v2,x86_64-v3,x86_64-v4" as v2 can run on any machines with v2 and above .

@belforte
Copy link
Member

belforte commented Dec 2, 2024

the problem with that list is not just that it is long and hard to read (well...once coded it will be consumed by machines, when debugging is not needed ☹️ ), but that as soon as a new, compatible microarch version is added, we do not match that hardware until we modify the code. @mmascher how difficult would it be to parse the Microarch machine classAd to extract the version number so that

  • job indicates e.g. +REQUIRED_MINIMUM_MICROARCH_VERSION = 3
  • execution slots are matched on Target.Microarch-version >= MY.REQUIRED_MINIMUM_MICROARCH_VERSION

?

@stlammel
Copy link

fully agree Stefano! - Stephan

@amaltaro
Copy link
Contributor

As mentioned in this comment:
https://indico.cern.ch/event/1487904/

I was considering to use it as a single integer, such that comparison/evaluation in glideinWMS can be simple.

However, if @mmascher and others think that a construction like MY.REQUIRED_MINIMUM_MICROARCH="'x86-64-v3" is going to be more solid, I see no reason why WMCore should do something different than CRAB. So we will just use the same convention.

@belforte
Copy link
Member

belforte commented Dec 10, 2024 via email

@mmascher
Copy link
Member Author

Ok, let's go with the integer solution. Here is the constraint, I gave it a test.

MICROARCH_CONSTRAINT = (REQUIRED_MINIMUM_MICROARCH <= int(substr(split(Microarch,"-")[1],1)))

It could be either added as a machine requirements in the glideinWMS frontend, or as a job requirement by WMagent when it writed the JDL.

Should I add it to the frontend?

@belforte
Copy link
Member

thanks @mmascher . My preference is that match making requirements are set in the gWms, like done e.g. for DESIRED_Sites. Having reuquirements in multiple places is confusing.
But maybe I miss something ... I see that CRAB jobs specify Requirements = stringListMember(TARGET.Arch, REQUIRED_ARCH) to which HTC adds requirements on OpSys and other things. In other words: I am confused :-)

@belforte
Copy link
Member

@mmascher does this mean that the default is not any anymore as you wrote earlier ?
what should it be ? 0 ?

@mmascher
Copy link
Member Author

Hi @belforte ,

If we go with REQUIRED_MINIMUM_MICROARCH as an integer I would not set "any" as default. The final constraint I tested in ITB has a protection for REQUIRED_MINIMUM_MICROARCH being defined. So you can either let it undefined, or set it as 0 as a default behaviour. Either way all slots will be matched.

@amaltaro
Copy link
Contributor

@mmascher one more question. Should we define this job classad only if the cpu architecture is x86-64? Or can we always have it defined, regardless of the architecture, and only update the minimum microarch according to the CMSSW version?

From what I can tell, production jobs could be requesting multiple architectures:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/BossAir/Plugins/SimpleCondorPlugin.py#L641

will the settings of microarchitecture keep this functionality intact? In other words, a job setting for example:

REQUIRED_MINIMUM_MICROARCH=2
REQUIRED_ARC="X86_64,aarch64"

will it still manage to match against either resource type?

@srimanob
Copy link

Hi @amaltaro
I assume if we use ARM (aarch64), we should keep "any" for the moment. We don't have much experiance on the microarch for ARM. Thanks.

@mmascher
Copy link
Member Author

I can add a protection for the machine microarch as well. As far as I know it is not defined for aarch64, so in this case we can allow jobs to run regardless of their REQUIRED_MINIMUM_MICROARCH. Does this sound ok?

@amaltaro
Copy link
Contributor

Yes Marco, I think this would be ideal for the moment. In other words, the desired behavior I have in mind is:

if REQUIRED_ARCH != X86_64
  disregard REQUIRED_MINIMUM_MICROARCH
else
  MICROARCH >=  REQUIRED_MINIMUM_MICROARCH  # assuming that default is REQUIRED_MINIMUM_MICROARCH=0

Does it sound good to you? @mmascher @belforte

@belforte
Copy link
Member

that's implicit in the fact that we specify microarch with a number. Should at some point we need sam afunctionality for Arm, or whatever else, even if still a number, will likely be a different one.
I assumed that Marco is prepared to add the proper amount of if REQUIRED_ARCH at that point.
Of course we can always change on client side as well... as discussed already it is difficult to be future-proof against any possible future !

@belforte
Copy link
Member

the agreed behavior has been implemented in CRAB, and deployed on production server, but it is only accessible using crab-dev for CRABClient. crab-prod (i.e. crab) still ignores the microarchitecture.
I'd rather not touch crab-prod client until after the holidays.

@khurtado
Copy link
Contributor

khurtado commented Feb 1, 2025

CC: @amaltaro
Just to make sure I understand. We need something like this:

if 'x86_64' not in REQUIRED_ARCH.split(","):
  ad['My.REQUIRED_MINIMUM_MICROARCH'] = "0"
else
  ad['My.REQUIRED_MINIMUM_MICROARCH'] = str(getScramMicroArch()) 

where getScramMicroArch() gets the value of SCRAM_MIN_SUPPORTED_MICROARCH from the scram environment.
And in order to get the scram environment we need to use:
scram project & eval $SCRAM_COMMAND runtime -sh

Is this correct?
We do have this class:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMRuntime/Tools/Scram.py

But so far, we only use it at the worker node (runtime level) in the CMSSW executor.
Do I understand correctly that since we need to inject the value of SCRAM_MIN_SUPPORTED_MICROARCH as a classad, then we need to get the scram environment at the Agent-level?
We do not have cvmfs at the agent-level, so we can't simply use the Scram() object.

We could alternatively use the value of cmsswVersions and define "3" for CMSSW>=15.x and "0" otherwise. But this is more like a workaround that may not age well.

https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/BossAir/Plugins/SimpleCondorPlugin.py#L633-L634C53

We could also define a variable that is part of the request config, e.g.: self.step.application.setup.scramMinMicroArch, but setting the value has the same issue (no cvmfs to run scram) as above, it would need to be done manually (CMSSW 15.x->"3", etc)

What about StepChain, where we run multiple CMSSW versions?
Do we make it so that if there is:
CMSSW_12, CMSSW_15, CMSSW_14 -> Then we require v3, since there is 1 CMSSW15.x?

As a side note, the CRAB client has its own class, but this client runs on nodes with cvmfs access and that cmsenv a CMSSW release already, so the procedure is different.
https://github.com/dmwm/CRABClient/blob/8b8da2a40b2ee3bd8a9df03613ff291608262b56/src/python/CRABClient/JobType/ScramEnvironment.py#L107

So in short, I think the alternatives are

  1. To hardcode the conditions (CMSSW releases >=15 use "v3"), or 2
  2. Request to get cvmfs access in the agent nodes so we can use cvmfs+scram using the CMSSW_release versions so we can get the output of SCRAM_MIN_SUPPORTED_MICROARCH.

I'm trying to see if there is anything I'm missing here and there is a better approach.

@belforte
Copy link
Member

belforte commented Feb 1, 2025

A (hopefully not needed) warning:
I do not know how you further manipulates ad[ ] before this is handed to HTCondor, but make sure that eventually the ad is an integer, i.e.
condor_q <somejob> -l |grep REQUIRED_MINIMUM_MICROARCH
is e.g.
REQUIRED_MINIMUM_MICROARCH = 2
not
REQUIRED_MINIMUM_MICROARCH = "2"

It just happened to me to get lost in the variable manipulation in old CRAB code and end up with the latter so jobs were not matching for a day 😭 https://mattermost.web.cern.ch/cms-o-and-c/pl/z83bwi135in3jqgd6jd8zu545r

@khurtado
Copy link
Contributor

khurtado commented Feb 3, 2025

@belforte Thank you, yes. When manipulating the ad dictionary, "2" should become an integer. Strings are created via classad.quote()

@khurtado
Copy link
Contributor

khurtado commented Feb 3, 2025

Just had a chat with Alan via MM:
@smuzaffar Given that we do not have CVMFS access in the agents and hence cannot use scram at that level, can we get a web-based document like this:

https://cmssdt.cern.ch/SDT/cgi-bin/ReleasesXML?anytype=1&anyarch=1

With the information of SCRAM_MIN_SUPPORTED_MICROARCH for each CMSSW_version: ?

Otherwise, we would likely be just hardwiring v3 for CMSSW 15 or greater

@belforte
Copy link
Member

belforte commented Feb 3, 2025

why can't you have CVMFS ? It is very easy to mount it in the VM's where you run the agents (via puppet e.g.) and to bind it into the docker containers. And it may be useful for other things as well.

@amaltaro
Copy link
Contributor

amaltaro commented Feb 4, 2025

Mounting CVMFS is probably a small portion of this development. When it involves SCRAM, I feel like it always demands more effort than expected. So I am all in favor of having a clean and easy to read information provided from usptream, if possible.

@smuzaffar
Copy link

Hi @amaltaro,
SCRAM_MIN_SUPPORTED_MICROARCH is a dynamic thing. SCRAM changes its depending on the system where it is setting up cmssw env. Note that CMSSW_15_0_X and above supports two micro archs

  • x86-64-v3 which is default and should be used when system has x86-64-v3 support
  • x86-64-v2 which is an additional microarch and scram falls back to it if system does not support x86-64-v3

So if you ask me what is the min supported micro-arch for CMSSW_15_0_X then my answer would be x86-64-v2. But we want to target x86-64-v3 resources for CMSSW_15_0_X and remaining <5% x86-64-v2 resources can be used by old cmssw releases.

I would suggest that I add an extra field in https://cmssdt.cern.ch/SDT/cgi-bin/ReleasesXML?anytype=1 e.g.

<project label="CMSSW_15_0_0_pre3" type="Development" state="Announced" default_micro_arch="x86-64-v3"/>

I think you are using ReleasesXML already so you can extract the default micro-arch information for this. For older release where we do not have any micro-arch defined in the ReleasesXML you can always use the minimum x86-64-v2 . Will that be enough you cover your use case?

@smuzaffar
Copy link

For older release where we do not have any micro-arch defined in the ReleasesXML you can always use the minimum x86-64-v2

OR for such releases (where ReleasesXML do not provide micro-arch information) better to not add any requirement on micro-arch

@belforte
Copy link
Member

belforte commented Feb 8, 2025

not add any requirement on micro-arch

this is done by setting REQUIRED_MINIMUM_MICROARCH=0

@khurtado
Copy link
Contributor

@smuzaffar
So, basically, even if CMSSW 15 has support for v2 and SCRAM_MIN_SUPPORTED_MICROARCH can yield "v2", we want to match to resources with "v3" support, isn't it?

If so, yes please, expanding that xml would work. I would need to update the first comment to make this clear.

@smuzaffar
Copy link

@khurtado , for tools which do not have access to scram/cvmfs (e.g. the product system) should use the ReleasesXML and if they find the default_micro_arch set for a release then they should use it and request resources for that micro-arch .

For tools, like crab, which submit jobs from a scram developer area should use the SCRAM_MIN_SUPPORTED_MICROARCH environment.

Does this make sense?

@khurtado
Copy link
Contributor

@smuzaffar Yes, that makes sense.
Please let me know once ReleasesXML is expanded with the new value. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests