Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AFIT-107: new RCT bad flags for FT0 #13698

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open

Conversation

afurs
Copy link
Collaborator

@afurs afurs commented Nov 15, 2024

No description provided.

Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1

@afurs
Copy link
Collaborator Author

afurs commented Nov 15, 2024

@jotwinow @andreasmolander please review it

jotwinow
jotwinow previously approved these changes Nov 16, 2024
Copy link
Collaborator

@knopers8 knopers8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I will let @JianLIUhep have the final call, but this contradicts what QC flags are for. The flags should define the effect on the usability of the data for analyzers. The "comment" is used to explain what went wrong. Here you are mistaking cause with effect.

This idea was presented many times for various audiences and I never received objection for this proposal. If you need a bunch of "const strings" for specifying the cause of bad data, I would rather propose that we allow for a set of predefined comments. The flags themselves should stay for the analyzers to programmatically determine whether they can use a given timespan of data or not.

@knopers8
Copy link
Collaborator

Also, all of those flags are detector-specific. Do they have to be? If we continue this road, we will end up with tens or hundreds of flags which mean the same, but for different detectors.

@afurs
Copy link
Collaborator Author

afurs commented Nov 18, 2024

The "comment" is used to explain what went wrong. Here you are mistaking cause with effect.

Indeed, I exactly put into comments explanation of what went wrong, e.g. Bad FT0 Data Consistency -> means that FT0 haven't sent data from some modules - explanation as it is.

The flags themselves should stay for the analyzers to programmatically determine whether they can use a given timespan of data or not.

I agree, for instance Bad FT0 Time Ofsset Calibration -> means that FT0's averaged time will be incorrect in AOD -> collision time and z-vertex are incorrect. But FT0's trigger data could be fine, and then analyzer shall decide does he need FT0's time or not.

Also, all of those flags are detector-specific. Do they have to be? If we continue this road, we will end up with tens or hundreds of flags which mean the same, but for different detectors.

BadFT0DataConsistency, BadFT0HWSettings, BadFT0RecoSettings - also can be used for FV0 and FDD. First two because of common electronics. Third - reco principle almost the same among all three FIT detectors.

BadFT0TimeOfssetCalibration, BadFT0TimeSlewingCalibration - they are only specific for FT0.

P.S. I got from Elena Botta request to implement bad quality flags for FT0 for RCT

@knopers8
Copy link
Collaborator

Indeed, I exactly put into comments explanation of what went wrong, e.g. Bad FT0 Data Consistency -> means that FT0 haven't sent data from some modules - explanation as it is.

Apologies, I assume you are familiar with the documentation of this module. By "comment" I meant the comment field of QualityControlFlag. I see you even used it yourself to add some details: https://github.com/AliceO2Group/QualityControl/blob/aaf01a6592ad6a026aa31d8b74df5796360c6388/Modules/FIT/FT0/src/FractionCheck.cxx#L109 , which is good.

Would you be able to come up with a phrase which describes how inconsistent FT0 data affects AODs?

I agree, for instance Bad FT0 Time Ofsset Calibration -> means that FT0's averaged time will be incorrect in AOD -> collision time and z-vertex are incorrect. But FT0's trigger data could be fine, and then analyzer shall decide does he need FT0's time or not.

First of all, please correct the typo, it's "Offset", not "Ofsset". Then, as far as I understand, the effect on data is "Incorrect time offset", while "bad time offset calibration" is the cause. Does this sound reasonable to you?

BadFT0DataConsistency, BadFT0HWSettings, BadFT0RecoSettings - also can be used for FV0 and FDD. First two because of common electronics. Third - reco principle almost the same among all three FIT detectors.

Why not for any detector? Anyway, the assigned flags are associated with a detector and I can imagine that any detector has hardware that can be set up incorrectly. This being said, BadHWSettings and BadRecoSettings seem again like causes, not effects. These could be stated in QualityControlFlag::mComment (and the corresponding field in RCT).

P.S. I got from Elena Botta request to implement bad quality flags for FT0 for RCT

That's perfectly fine and I am happy that there is a proposal to extend this list. However, new flags have to be carefully reviewed before adding them, because whatever we will add, it will stay with us for the rest of Run 3&4 at least.

@afurs
Copy link
Collaborator Author

afurs commented Nov 18, 2024

@knopers8 Ok, I got it. Effect as flag, not cause. A have to reconsider my last commit. I guess the most questions will disappear after.

Concerning data inconsistency in FIT, I see several items for FT0's (also can be used for FV0 and FDD):

  1. Bad amplitude;
  2. Bad time;
  3. Bad BCID;
  4. Bad trigger signals

@JianLIUhep
Copy link
Contributor

Hi @afurs,

We’d like to clarify a few points to ensure we’re on the same page.

Our original email was intended to gather information on which of the existing 13 flags are being used (or planned to be used) for each detector. We don’t intend to add new flags unless strictly necessary. These details could be added as comments to the Bad flag, and we could then communicate to PWGs that, for FT0, the "Bad" flag encompasses all four possibilities (amplitude, time, BCID, trigger).

Please don’t hesitate to reach out if you have any further questions.

Thank you,
Jian and Elena

cc @knopers8

@afurs
Copy link
Collaborator Author

afurs commented Nov 18, 2024

Hello @JianLIUhep , ok, I guess I understood your point. Anyway, I provided first version of bad flags related to FT0. After discussion with @knopers8 I realized that maybe better to use list of 4 items which I presented above. For me any variant is OK.

I checked original email again:
We are interested in understanding if bad flags need to be split in different groups meaning that one group is for runs that are for sure not useful for analyses and another one is for runs that could be anyway of interest.
I would say that in first version of the code: BadRecoSettings, BadFT0TimeOfssetCalibration and BadFT0TimeSlewingCalibration - can be fixed in next pass after broken one (second group)
In case of BadHWSettings and BadDataConsistency - depends on certain situation, could be first or second group.

Please let me know which variant you would like to have. Also we can move discussion to the related ticket https://its.cern.ch/jira/browse/AFIT-107

@alibuild
Copy link
Collaborator

alibuild commented Dec 4, 2024

Error while checking build/O2/fullCI_slc9 for 43099b7 at 2025-01-28 03:34:

## sw/BUILD/O2-sim-challenge-test-latest/log
./sim-challenge.logDetected critical problem in logfile ft0reco.log
./sim-challenge.logft0reco.log:[10463:internal-dpl-ccdb-backend]: [03:34:34][ERROR] Exception while running: Fatal error. Rethrowing.
./sim-challenge.logft0reco.log-[10463:internal-dpl-ccdb-backend]: [03:34:34][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-ft0-reco-workflow, device shutting down. Reason: Fatal error
./sim-challenge.log[10463:internal-dpl-ccdb-backend]: [03:34:33][ERROR] CCDBDownloader CURL transfer error - Timeout was reached
./sim-challenge.log[10463:internal-dpl-ccdb-backend]: [03:34:33][ERROR] CcdbDownloader finished transfer http://alice-ccdb.cern.ch/CTP/Calib/OrbitReset for 1546300800000 (agent_id: alimetal01.cern.ch-1738031672-ZPpHnZ) with http code: 0
./sim-challenge.log[10463:internal-dpl-ccdb-backend]: [03:34:33][ERROR] File CTP/Calib/OrbitReset could not be retrieved. No more hosts to try.
./sim-challenge.log[10463:internal-dpl-ccdb-backend]: [03:34:33][FATAL] Unable to find object CTP/Calib/OrbitReset/1546300800000
./sim-challenge.log[10463:internal-dpl-ccdb-backend]: [03:34:34][ERROR] Exception while running: Fatal error. Rethrowing.
./sim-challenge.log[10463:internal-dpl-ccdb-backend]: [03:34:34][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-ft0-reco-workflow, device shutting down. Reason: Fatal error
./sim-challenge.log[ERROR] Workflow crashed - PID 10463 (internal-dpl-ccdb-backend) did not exit correctly however it's not clear why. Exit code forced to 128.
./sim-challenge.log[ERROR]  - Device internal-dpl-ccdb-backend: pid 10463 (exit 128)
./sim-challenge.log[INFO]    - First error: [03:34:33][FATAL] Unable to find object CTP/Calib/OrbitReset/1546300800000
./sim-challenge.log[ERROR] SEVERE: Device internal-dpl-ccdb-backend (10463) had at least one message above severity 5: Unable to find object CTP/Calib/OrbitReset/1546300800000
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/37}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/38}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/40}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/42}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/43}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/45}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/46}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/47}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/48}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/49}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/37}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/38}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/40}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/42}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/43}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/45}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/46}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/47}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/48}
./digi.log[5649:SimReader]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/49}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/37}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/38}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/40}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/42}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/43}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/45}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/46}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/47}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/48}
./digi.log[5650:internal-dpl-ccdb-backend]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/49}
./digi.log[5648:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/37}
./digi.log[5648:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/38}
./digi.log[5648:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/40}
./digi.log[5648:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/42}
./digi.log[5648:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/43}
[0 more errors; see full log]

Full log here.

@alibuild
Copy link
Collaborator

alibuild commented Dec 12, 2024

Error while checking build/O2/fullCI for 43099b7 at 2025-01-01 06:41:

## sw/BUILD/O2-latest/log
c++: error: unrecognized command-line option '--rtlib=compiler-rt'
c++: error: unrecognized command-line option '--rtlib=compiler-rt'


## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-full-system-test-latest/log
Detected critical problem in logfile reco_NOGPU.log
reco_NOGPU.log:[33025:calib-phos-l1phase]: [05:41:03][ERROR] Exception while running: Fatal error. Rethrowing.
[33025:calib-phos-l1phase]: [05:41:03][ERROR] Requested resource does not exist: http://alice-ccdb.cern.ch//PHS/Calib/BadMap/1735710063292/
[33025:calib-phos-l1phase]: [05:41:03][FATAL] Got nullptr from CCDB for path PHS/Calib/BadMap and timestamp 1735710063292
[33025:calib-phos-l1phase]: [05:41:03][ERROR] Exception while running: Fatal error. Rethrowing.
[33025:calib-phos-l1phase]: [05:41:03][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-phos-calib-workflow, device shutting down. Reason: Fatal error
[ERROR] Workflow crashed - PID 33025 (calib-phos-l1phase) did not exit correctly however it's not clear why. Exit code forced to 128.

Full log here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants