Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added RNTuple read/write prototype #47402

Open
wants to merge 1 commit into
base: CMSSW_15_1_RNTUPLE_X
Choose a base branch
from

Conversation

Dr15Jones
Copy link
Contributor

@Dr15Jones Dr15Jones commented Feb 19, 2025

PR description:

Added first version of RNTupleSource and RNTupleOutputModule.

PR validation:

Code compiles. Local checks work.

resolves cms-sw/framework-team#1159

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones for CMSSW_15_1_RNTUPLE_X.

It involves the following packages:

  • FWCore/Services (core)
  • FWIO/RNTuple (****)
  • IOPool/Common (core)

The following packages do not have a category, yet:

FWIO/RNTuple
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@fwyzard, @makortel, @missirol, @wddgit this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 19, 2025

cms-bot internal usage

@Dr15Jones
Copy link
Contributor Author

please test

@makortel
Copy link
Contributor

@hahnjo Here is the PR we discussed earlier today.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-82d1d5/44519/summary.html
COMMIT: 57cd25f
CMSSW: CMSSW_15_1_RNTUPLE_X_2025-02-18-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47402/44519/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

Copy link
Contributor

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a full review, just some comments from a quick look over the code)


auto file = TFile::Open(vm["file"].as<std::string>().c_str(), "r");

auto ntuple = RNTupleReader::Open(*file->Get<ROOT::RNTuple>("Events"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note here and everywhere else: The caller is responsible for deleteing the pointer returned by Get<ROOT::RNTuple> (because it's not inheriting from TObject and thus no automatic memory-management)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So could this be handled inside ROOT? One can have TFile::Get have different return types for different classes via

#include <memory>
#include <concepts>
struct RNTuple;

struct TFile {
    template<typename T>
    requires requires {requires !std::same_as<T, RNTuple>;}
    T* Get();

    template<typename T>
    requires requires {requires std::same_as<T, RNTuple>;}
    std::unique_ptr<T> Get();

};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will discuss. Note however that TFile::Get always hands out memory ownership for classes that don't inherit from TObject and the caller is responsible for deleteing it to avoid memory leaks.

Also just to point out, you don't need to keep the ROOT::RNTuple anchor alive, a local std::unique_ptr<ROOT::RNTuple> anchor(file->Get<ROOT::RNTuple>("Events")); is sufficient.


void RNTupleOutputFile::write(EventForOutput const& e) {
{
auto rentry = events_->CreateEntry();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you bind all fields manually, you can use CreateRawEntry to avoid RNTuple allocating all values. In general, REntry objects can be re-used so you may want to create one raw entry during setup and then just re-bind all pointers for each write.

//if (rntuple_->branchType() == InEvent) {
// CMS-THREADING For the primary input source calls to this function need to be serialized
//InputFile::reportReadBranch(inputType_, std::string(br->GetName()));
//}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commented out code still useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to do such reporting in the future so keeping it is probably a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this as a future task

events_(file_.get(), "Events", "EventAuxiliary", options(iOpt)) {}

std::vector<ParentageID> RNTupleInputFile::readParentage() {
auto parentageTuple = RNTupleReader::Open(*file_->Get<ROOT::RNTuple>("Parentage"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we'll probably want to throw edm::Exception if an RNTuple object does not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a future task for these checks.

std::unique_ptr<RNTupleInputFile> file_;
std::vector<input::RNTupleDelayedReader> runReaders_;
std::vector<input::RNTupleDelayedReader> lumiReaders_;
std::vector<input::RNTupleDelayedReader> eventReaders_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ProcessBlock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be done at a future time

nullptr;

//If a fatal exception happens we need to make a copy so we can
// rethrow that exception on other threads. This avoids TTree
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated to reference RNTuple, not TTree (although it is likely the latter has the same issue)

Added RNTupleSource and RNTupleOutputModule
@Dr15Jones Dr15Jones force-pushed the rntupleSourceTest_CMSSW_15_1_RNTUPLE_2024-02-16 branch from 57cd25f to 36ca324 Compare February 24, 2025 19:26
@cmsbuild
Copy link
Contributor

Pull request #47402 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-82d1d5/44617/summary.html
COMMIT: 36ca324
CMSSW: CMSSW_15_1_RNTUPLE_X_2025-02-23-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47402/44617/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants