Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragments and Cards for DY->TauTau #3812

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

lucasrussell01
Copy link
Contributor

Cards and fragment for DY->TauTau

  • Fragment with filters for different final states
  • Cards added for the DY->TauTau 1-4 jet samples

@lviliani
Copy link
Contributor

lviliani commented Dec 4, 2024

Hi, the common background team will take care of this sample (@Cvico @sihyunjeon @agrohsje).
Or is there anything specific you want to include here?

@sihyunjeon
Copy link
Collaborator

sihyunjeon commented Dec 4, 2024

For which analysis or POG is this? Is this related to tau polarization?
What is the need for having separate tau dedicated sample? (if it is not for tau polarization)

@lucasrussell01
Copy link
Contributor Author

For which analysis or POG is this? Is this related to tau polarization? What is the need for having separate tau dedicated sample? (if it is not for tau polarization)

Hi! This would be for the Higgs->TauTau CP analysis, we have found that in our most sensitive channels the stat uncertainty on the DY MC component is of order 10%. We would like to reduce this if possible, but are conscious of the fact that we can't order too many events - so we want to make gridpacks for DY->TauTau and will have filters in the fragment.

We will discuss this with HLepRare next week and report back to you. I just opened the gridpack PR so that it's ready if we decide to go ahead with the ordering.

@sihyunjeon
Copy link
Collaborator

sihyunjeon commented Jan 9, 2025

hi, do you really need njet split case for this?

in any case
what you can still do is
take existing DY gridpacks (from central production)
add lhe filters on top of it to take pdfid=15 included events
use your hadronizer filter for further stat optimization.

instead of making all the gridpacks separately from scratch

and with this way -- your cross section after filtering is not going to be super large so you can just do sth like (cross section) x (lumi) x (5-10) = (nevents to request) -- and you don't have to split the njets

unless the analysis is built on a signal region with sth like "2 taus + 3 or more jets"

@lucasrussell01
Copy link
Contributor Author

hi, do you really need njet split case for this?

in any case what you can still do is take existing DY gridpacks (from central production) add lhe filters on top of it to take pdfid=15 included events use your hadronizer filter for further stat optimization.

instead of making all the gridpacks separately from scratch

and with this way -- your cross section after filtering is not going to be super large so you can just do sth like (cross section) x (lumi) x (5-10) = (nevents to request) -- and you don't have to split the njets

unless the analysis is built on a signal region with sth like "2 taus + 3 or more jets"

Hi, we are going to need more or the higher jet multiplicities than the lower ones, which was the reason for the different gridpacks/cards.
Thank you for the advice about using filters, in this case I've already made the gridpacks though.

We will be either requesting NLO or LO samples, depending on the outcome of our request with HiggsLepRare, and I can delete the cards we won't use (everything is here for now so the Higgs MC contacts etc can see)

@lucasrussell01
Copy link
Contributor Author

lucasrussell01 commented Jan 16, 2025

Hi @sihyunjeon @vlimant,

I have discussed these requests with the HLepRare convenor @kandrosov - as well as the HLepRare MC contact @maravin . We decided that we will go ahead (via the Higgs PAG) with the production of the NLO DY->TauTau Filtered samples for the 2022 and 2023 eras as they will be very beneficial to all Higgs->TauTau early Run 3 analyses.

We have all of the necessary gridpacks etc produced for these already, would it be possible to please merge this PR so that the cards are stored on genproductions?

I can remove the LO cards if that is an issue.

Please let me know how to proceed.

Best,
Lucas

@lviliani
Copy link
Contributor

Hi Lucas, for 2024 production we have already injected an inclusive DY tautau NLO sample, as well as the 0J bin (have a look here).

@sihyunjeon can comment, but I think the other jet bins will also be submitted when possible.

Are these what you need? If so, I think the easiest is to just clone them into 2022 and 2023 campaigns.

@lucasrussell01
Copy link
Contributor Author

Hi @lviliani,

For 22/23 we need to have our own requests as we have a HepMC filter that is drastically reducing the number of events the we need to order. We're also on quite a short timescale (we want to try and have these ready for Moriond), so I think the easiest is to proceed via the HIG PAG for these eras?

It is good to know for the 24 samples, we will review if we need to order more stats for 2024 (and further) at a later point.

@lviliani
Copy link
Contributor

Ok, you have to add the fragment with the filter also for the NLO samples then. I only see the one for the LO MLM sample here.

I think there's also a difference regarding the models wrt standard samples, because you are using sm-no_b_mass and loop_sm-no_b_mass instead of sm-ckm_no_b_mass and loop_sm-ckm_no_b_mass.
So it means you are assuming a diagonal CKM matrix.
Can you please comment further on this?

@sihyunjeon
Copy link
Collaborator

What is the required statistics?

@lucasrussell01
Copy link
Contributor Author

lucasrussell01 commented Jan 17, 2025

Ok, you have to add the fragment with the filter also for the NLO samples then. I only see the one for the LO MLM sample here.

I think there's also a difference regarding the models wrt standard samples, because you are using sm-no_b_mass and loop_sm-no_b_mass instead of sm-ckm_no_b_mass and loop_sm-ckm_no_b_mass. So it means you are assuming a diagonal CKM matrix. Can you please comment further on this?

Ok thank you! I will add the fragment shortly sorry I missed that.

With regards the the model I'm using, I took what was here

- which I thought was the standard sample for 22/23?
This uses the loop_sm-no_b_mass - if it is not the standard sample could you please let me know where the cards are located?

@lucasrussell01
Copy link
Contributor Author

What is the required statistics?

With the filters we are applying, the breakdown is:
2022: 8.3 M for 0J, 11.8 M for 1J, 27.8 M for 2J
2022EE: 27.6 M for 0J, 39.2 M for 1J, 92.4 M for 2J
2023: 18.4 M for 0J, 26.2 M for 1J, 61.7 M to 2J
2023BPix: 9.8 M for 0J, 14.0 M for 1J, 32.9 M for 2J

The HIG MC contact is in the process of creating the necessary prepIDs

@sihyunjeon
Copy link
Collaborator

Do you have the rough filter efficiencies?

@lucasrussell01
Copy link
Contributor Author

Do you have the rough filter efficiencies?

Yep, here is a summary of what I got running GenXSec analyser. HLepRare MC are running validation on the 22 requests already, so I imagine they should also get numbers for these soon ish.

image

@lucasrussell01
Copy link
Contributor Author

@lviliani I have now added the NLO fragment for 0J - I can add the other ones too if you would like (1 and 2J), but they are identical except for the path to the gridpack.

@lucasrussell01
Copy link
Contributor Author

Ok, you have to add the fragment with the filter also for the NLO samples then. I only see the one for the LO MLM sample here.

I think there's also a difference regarding the models wrt standard samples, because you are using sm-no_b_mass and loop_sm-no_b_mass instead of sm-ckm_no_b_mass and loop_sm-ckm_no_b_mass. So it means you are assuming a diagonal CKM matrix. Can you please comment further on this?

Ok thank you! I will add the fragment shortly sorry I missed that.

With regards the the model I'm using, I took what was here

- which I thought was the standard sample for 22/23?

This uses the loop_sm-no_b_mass - if it is not the standard sample could you please let me know where the cards are located?

@lviliani @sihyunjeon Could you please confirm whether the cards stored on the genproductions GitHub for the central samples are indeed different to what was used in 22/23? (ie diagonal vs non diagonal CKM)

If we should be using the CKM model I will remake the grid packs and check filters etc so that there is some consistency (and we can stitch together with central)

@lviliani
Copy link
Contributor

Hi, I think the CKM model is what has been used for common backgrounds.
It should not have a big effect I think, it was mainly for consistency.
But I let @sihyunjeon comment here.

@sihyunjeon
Copy link
Collaborator

Other several settings (taking the pdgid values for SM parameters) are also different. Again, I would just recycle existing fragments from bottom three prepids from [1] and add filter on top of it. This is also helpful in case people need to stack up two different samples if they are lacking stats. mZ=91GeV and mZ=90GeV doesn't make huge difference but other parameters in the run card, i am not sure what could still be different.

[1] https://cms-pdmv-prod.web.cern.ch/mcm/requests?dataset_name=DYto2L-2Jets_MLL-50_*J_TuneCP5_13p6TeV_amcatnloFXFX*&prepid=GEN-Run3Summer23wmLHEGS*&page=0&shown=4398046773375

@sihyunjeon
Copy link
Collaborator

@lviliani if you prefer this to be done in the central bkg framework, i can work on it next week

@lucasrussell01
Copy link
Contributor Author

lucasrussell01 commented Jan 21, 2025

Other several settings (taking the pdgid values for SM parameters) are also different. Again, I would just recycle existing fragments from bottom three prepids from [1] and add filter on top of it. This is also helpful in case people need to stack up two different samples if they are lacking stats. mZ=91GeV and mZ=90GeV doesn't make huge difference but other parameters in the run card, i am not sure what could still be different.

[1] https://cms-pdmv-prod.web.cern.ch/mcm/requests?dataset_name=DYto2L-2Jets_MLL-50_*J_TuneCP5_13p6TeV_amcatnloFXFX*&prepid=GEN-Run3Summer23wmLHEGS*&page=0&shown=4398046773375

Thanks @sihyunjeon

I'm on holiday this week but on Sunday or Monday I can copy the fragments you linked to make sure everything is the same. Should I just untar the grid packs that are used there to check the run/proc cards etc?

Again I initially copied everything from what I saw on the genproductions GitHub - this just isn't up to date then I guess?

@lviliani
Copy link
Contributor

Can't you just use the same gridpacks linked by @sihyunjeon?
I mean, just clone those requests in McM and modify the fragment to add your filter.

@lucasrussell01
Copy link
Contributor Author

Can't you just use the same gridpacks linked by @sihyunjeon?

I mean, just clone those requests in McM and modify the fragment to add your filter.

Not quite because I think those are DY->LL (all leptons) not TauTau. But if there are NLO 0,1,2J DY->TauTau available that are compatible with the 22/23 datasets we could just ad a filter yeah.

@lviliani
Copy link
Contributor

One possibility is to follow the approach used here as an example:
https://cms-pdmv-prod.web.cern.ch/mcm/requests?prepid=GEN-RunIII2024Summer24wmLHEGS-00053

You can see that in this case we start from a DY->LL gridpack and we select the TauTau final state with a filter.
On top of that you could add also your filter.

@sihyunjeon
Copy link
Collaborator

hi, do you really need njet split case for this?

in any case what you can still do is take existing DY gridpacks (from central production) add lhe filters on top of it to take pdfid=15 included events use your hadronizer filter for further stat optimization.

instead of making all the gridpacks separately from scratch

and with this way -- your cross section after filtering is not going to be super large so you can just do sth like (cross section) x (lumi) x (5-10) = (nevents to request) -- and you don't have to split the njets

unless the analysis is built on a signal region with sth like "2 taus + 3 or more jets"

@lviliani 's comment is also what i meant that should be done.
you can add lhe filters (with pdgid tau=15) and then on top of this add your hepmc filters.
The cards in this git repo is not always up to date (you might even be able to find multiple DY)

@lucasrussell01
Copy link
Contributor Author

hi, do you really need njet split case for this?

in any case what you can still do is take existing DY gridpacks (from central production) add lhe filters on top of it to take pdfid=15 included events use your hadronizer filter for further stat optimization.

instead of making all the gridpacks separately from scratch

and with this way -- your cross section after filtering is not going to be super large so you can just do sth like (cross section) x (lumi) x (5-10) = (nevents to request) -- and you don't have to split the njets

unless the analysis is built on a signal region with sth like "2 taus + 3 or more jets"

@lviliani 's comment is also what i meant that should be done.

you can add lhe filters (with pdgid tau=15) and then on top of this add your hepmc filters.

The cards in this git repo is not always up to date (you might even be able to find multiple DY)

Hi @lviliani @sihyunjeon

Thanks for the example, I will give it a go on Monday (I'm on holiday this week and have no laptop). Thank you very much for all your help :)

@lucasrussell01
Copy link
Contributor Author

lucasrussell01 commented Jan 27, 2025

Hi @lviliani @sihyunjeon ,

I'm now using the central DY->LL gridpacks with a filter as suggested, so that all of the custom parameters etc are the same. I have left the updated fragments in this PR.

I have a last question about the McM validation script on these fragments. I've attached the summary printout at the bottom of this message.

The matching efficiency and HepMC filter efficiencies are essentially the same as what I had measured using my dedicated gridpacks which is good.

I don't see any effect of the LHE filter in the printed efficiencies near the cross section summary table, I just see that the number of events in the summary table is 1/3 of the total tested (the 10k written lower down). This 1/3 factor appearing I think means that the LHE filter is in fact working as expected - I was just wondering if there is anywhere I should expect to see a printout?

Btw I also notice the filter efficiency "[TO BE USED IN MCM]" is 1.0 still - does this mean the HepMC filter efficiency should not need to be included in the filter efficiency section of the request on McM?

The total efficiency as the very bottom (.0429) seems to be matching eff * HepMC eff * LHE eff which makes sense to me.

Does this mean that the filter efficiency put in the McM request should be HepMC eff * LHE eff (1/3) ?

------------------------------------
GenXsecAnalyzer:
------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Overall cross-section summary 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process         xsec_before [pb]                passed  nposw   nnegw   tried   nposw   nnegw   xsec_match [pb]                 accepted [%]     event_eff [%]
0               6.296e+03 +/- 6.822e+00         2964    2636    328     3374    3042    332     5.362e+03 +/- 4.462e+01         85.2 +/- 0.7    87.8 +/- 0.6
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Total           6.296e+03 +/- 6.822e+00         2964    2636    328     3374    3042    332     5.362e+03 +/- 4.462e+01         85.2 +/- 0.7    87.8 +/- 0.6
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Before matching: total cross section = 6.296e+03 +- 6.822e+00 pb
After matching: total cross section = 5.362e+03 +- 4.462e+01 pb
Matching efficiency = 0.9 +/- 0.0   [TO BE USED IN MCM]
HepMC filter efficiency (taking into account weights)= (2.61673e+06) / (1.77109e+07) = 1.477e-01 +- 8.300e-03
HepMC filter efficiency (event-level)= (429) / (2964) = 1.447e-01 +- 6.462e-03
Filter efficiency (taking into account weights)= (2.61673e+06) / (2.61673e+06) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (429) / (429) = 1.000e+00 +- 0.000e+00    [TO BE USED IN MCM]

After filter: final cross section = 7.922e+02 +- 4.499e+01 pb
After filter: final fraction of events with negative weights = 1.026e-01 +- 1.502e-03
After filter: final equivalent lumi for 1M events (1/fb) = 7.975e-01 +- 3.632e-02
TimeReport> Time report complete in 3995.63 seconds
 Time Summary: 
 - Min event:   0.0012188
 - Max event:   12.8796
 - Avg event:   0.347886
 - Total loop:  3989.65
 - Total init:  5.97395
 - Total job:   3995.63
 - EventSetup Lock: 0
 - EventSetup Get:  0
 Event Throughput: 2.50648 ev/s
 CPU Summary: 
 - Total loop:     3904.45
 - Total init:     4.17779
 - Total extra:    0
 - Total children: 322.064
 - Total job:      3908.63
 Processing Summary: 
 - Number of Events:  10000
 - Number of Global Begin Lumi Calls:  100
 - Number of Global Begin Run Calls: 1


=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 GenXSecAnalyzer      -w GenXSecAnalyzer:                      17       17
    2 LogicError           -w Pythia8Concurren                       1        1
    3 TimeReport           -e AfterModEndJob                         1        1
    4 fileAction           -s ExternalLHEProdu                       4        4
    5 fileAction           -s ExternalLHEProdu                       1        1

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 GenXSecAnalyzer      EndJob           EndJob           EndJob
    2 LogicError           Run: 1 Stream: 0                  
    3 TimeReport           EndJob                            
    4 fileAction           Run: 1           Run: 1           Run: 1
    5 fileAction           End Run: 1                        

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
Warning                18                  18
Error                   1                   1
System                  5                   5

dropped waiting message count 0
Thanks for using LHAPDF 6.4.0. Please make sure to cite the paper:
  Eur.Phys.J. C75 (2015) 3, 132  (http://arxiv.org/abs/1412.7420)
Validation report of TAU-Run3Summer22EEwmLHEGS-00012 sequence 1/1
Processed events: 10000
Produced events: 429
Threads: 1
Peak value RSS: 1599.84 MB
Peak value Vsize: 2329.18 MB
Total size: 275.002 MB
Total job time: 3995.63 s
Total CPU time: 3908.63 s
Event throughput: 2.50648
CPU efficiency: 97.82 %
Size per event: 656.4150 kB
Time per event: .3989 s
Filter efficiency percent: 4.29000000 %
Filter efficiency fraction: .0429000000

@lviliani
Copy link
Contributor

Dear @lucasrussell01, good question.
Actually, I think the output of the GenXSecAnalyzer in this case is not correct, meaning that you should NOT put 1 as filter efficiency in McM, as mistakenly reported in the printout.
After discussing with @DickyChant and @bbilin, the actual filter efficiency to be put in McM should be 0.0429, i.e. the number that you get in the report at the end.
My understanding is that the matching efficiency field in McM is not used at all.

@lucasrussell01
Copy link
Contributor Author

Dear @lucasrussell01, good question. Actually, I think the output of the GenXSecAnalyzer in this case is not correct, meaning that you should NOT put 1 as filter efficiency in McM, as mistakenly reported in the printout. After discussing with @DickyChant and @bbilin, the actual filter efficiency to be put in McM should be 0.0429, i.e. the number that you get in the report at the end. My understanding is that the matching efficiency field in McM is not used at all.

Ok great, thank you for the confirmation! Good to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants