Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert boost hists to ROOT TGraphs #17

Merged
merged 9 commits into from
Mar 21, 2024
Merged

Conversation

apaasch
Copy link
Collaborator

@apaasch apaasch commented Mar 8, 2024

Task structure to store boost histogram objects as TGraphAsymmErrors in rootfiles.
Main goal is to do this indepently from Root using uproot.
This is not yet possible but there is ongoing work from the uproot team addressing this issue.
Updates can be followed in this PR from uproot5.

As long as this feature is not yet implement the conversion is done in two steps:

  1. Extract values from the boost histogram and store as arrays, which can be used as an input for a TGraph object. Store the dictionary as a pickel file.
  2. Extract the arrays from the pickle file and use them as input for a TGraphAsymmError. This step is done independently from CF, since no Root module is implemented in the Sandboxes.

In the long term, the second step is included in the first one and remove the dependency from Root.

Additional

  • Move category IDs to base task

Copy link
Member

@dsavoiu dsavoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @apaasch. It looks OK to me so far, I only have a few comments related to code style and variable naming, but this should be easy to fix. One should also maybe clarify if the convert_to_root.py script is meant to run while columnflow is sourced or completely outside (it would also be good to add this information to the file).

dijet/tasks/root.py Outdated Show resolved Hide resolved
dijet/tasks/base.py Outdated Show resolved Hide resolved
@@ -57,8 +60,8 @@ def get_datasets(self) -> tuple[list[str], bool]:
# check that at least one config dataset matched
if not dataset_insts_from_process:
raise RuntimeError(
"no single dataset found in config matching "
f"process `{self.branch_data.process}`"
"no single dataset found in config matching ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comma needed here since the string continues on the next line.

Suggested change
"no single dataset found in config matching ",
"no single dataset found in config matching "

Copy link
Collaborator Author

@apaasch apaasch Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -68,8 +71,8 @@ def get_datasets(self) -> tuple[list[str], bool]:
# check that at least one user-supplied dataset matched
if not datasets_filtered:
raise RuntimeError(
"no single user-supplied dataset matched "
f"process `{self.branch_data.process}`"
"no single user-supplied dataset matched ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"no single user-supplied dataset matched ",
"no single user-supplied dataset matched "

Copy link
Collaborator Author

@apaasch apaasch Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

dijet/tasks/jer.py Outdated Show resolved Hide resolved
dijet/tasks/jer.py Outdated Show resolved Hide resolved
dijet/tasks/root.py Outdated Show resolved Hide resolved
print(f"{arg}: {getattr(args, arg)}")

# Construct the file path
base_path = "/nfs/dust/cms/user/paaschal/WorkingArea/Analysis/JERC/DiJet/data/dijet_store/analysis_dijet/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this script be run inside a columnflow environment (i.e. after sourcing setup.sh)? If so, it would be good to avoid hardcoding the path here and use the environment variables insteda (e.g. os.getenv("CF_STORE_LOCAL")).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be run within the CF environment. You suggestion will be added to the next commit.

apaasch and others added 3 commits March 12, 2024 10:11
Redo flake8 changes
Change naming scheme and structur in root task to be more efficient and readable.

Co-authored-by: Daniel Savoiu <[email protected]>
Copy link
Collaborator Author

@apaasch apaasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, all your suggestion have been added.

print(f"{arg}: {getattr(args, arg)}")

# Construct the file path
base_path = "/nfs/dust/cms/user/paaschal/WorkingArea/Analysis/JERC/DiJet/data/dijet_store/analysis_dijet/"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be run within the CF environment. You suggestion will be added to the next commit.

@@ -57,8 +60,8 @@ def get_datasets(self) -> tuple[list[str], bool]:
# check that at least one config dataset matched
if not dataset_insts_from_process:
raise RuntimeError(
"no single dataset found in config matching "
f"process `{self.branch_data.process}`"
"no single dataset found in config matching ",
Copy link
Collaborator Author

@apaasch apaasch Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -68,8 +71,8 @@ def get_datasets(self) -> tuple[list[str], bool]:
# check that at least one user-supplied dataset matched
if not datasets_filtered:
raise RuntimeError(
"no single user-supplied dataset matched "
f"process `{self.branch_data.process}`"
"no single user-supplied dataset matched ",
Copy link
Collaborator Author

@apaasch apaasch Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Copy link
Member

@dsavoiu dsavoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Looks good to me, but there is one small change I would make to the convert_to_root.py script (see below).

dijet/scripts/convert_to_root.py Outdated Show resolved Hide resolved
Copy link
Member

@dsavoiu dsavoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me now. Let's merge!

@dsavoiu dsavoiu merged commit 2dcdeaa into uhh-cms:master Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants