-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement TGraph-writing #1144
base: main
Are you sure you want to change the base?
Conversation
Another candidate may be a python dataclass with |
As discussed here: #1142 (comment) we may create confusion by considering an object that can be converted both to TTree and TGraph. |
Hello @jpivarski! I am a student of @grzanka, and I would like to join to this issue. I have begun to understand how different objects are transformed so they can be saved to a root file, and I believe I got good grasp of it. I have started to implement prototype of a function to_TGraph(), which, when given two arrays of values (and some other arguments), returns a writable TGraph object. It works, but more things still need to be done (especially TGraphErrors and TGraphAsymErrors support) def to_TGraph(
fName,
fTitle,
fNpoints,
fX,
fY,
fFunctions, # not used
fHistogram, # not used
fMinimum,
fMaximum,
fLineColor=602,
fLineStyle=1,
fLineWidth=1,
fFillColor=0,
fFillStyle=1001,
fMarkerColor=1,
fMarkerStyle=1,
fMarkerSize=1.0,):
tobject = uproot.models.TObject.Model_TObject.empty()
tnamed = uproot.models.TNamed.Model_TNamed.empty()
tnamed._deeply_writable = True
tnamed._bases.append(tobject)
tnamed._members["fName"] = fName
tnamed._members["fTitle"] = fTitle
tattline = uproot.models.TAtt.Model_TAttLine_v2.empty()
tattline._deeply_writable = True
tattline._members["fLineColor"] = fLineColor
tattline._members["fLineStyle"] = fLineStyle
tattline._members["fLineWidth"] = fLineWidth
tattfill = uproot.models.TAtt.Model_TAttFill_v2.empty()
tattfill._deeply_writable = True
tattfill._members["fFillColor"] = fFillColor
tattfill._members["fFillStyle"] = fFillStyle
tattmarker = uproot.models.TAtt.Model_TAttMarker_v2.empty()
tattmarker._deeply_writable = True
tattmarker._members["fMarkerColor"] = fMarkerColor
tattmarker._members["fMarkerStyle"] = fMarkerStyle
tattmarker._members["fMarkerSize"] = fMarkerSize
tGraph = uproot.models.TGraph.Model_TGraph_v4.empty()
tGraph._bases.append(tnamed)
tGraph._bases.append(tattline)
tGraph._bases.append(tattfill)
tGraph._bases.append(tattmarker)
tGraph._members["fNpoints"] = fNpoints
tGraph._members["fX"] = fX
tGraph._members["fY"] = fY
# fFuncitons - do i need it? probably not because it is array of zero items
# fHistogram - do i need it? probably not because it is a nullptr
tGraph._members["fMinimum"] = fMinimum
tGraph._members["fMaximum"] = fMaximum
return tGraph Now, the issue is deciding what we want to interpret as a TGraph. After some time of thinking, I came up with a few solutions:
In my opinion solution number 5. is the best one. It gives the most flexibility with little risk of unintentional conversion. What do you think? Which solution will suit the best? Or maybe you have your own vision that would be better? I would like to know your opinion! |
It can't be a dict because all dicts are already interpreted as TTree (as a set of TBranch names to arrays). Adding attributes, as in point 5, is somewhat cumbersome—it would have to be extra lines of Python, hard to do interactively—and hidden. You would just need to know the name, and even if you know that there is a special name, you might need to look it up to get the right spelling. How about a DataFrame? It can be recognized if it has column names like DataFrames (just Pandas or all of the new ones, Polars, CuDF, Modin? Can they be recognized in bulk?) are the basic way of describing scatter-point data in Python, and the TGraph* classes are the basic way of describing scatter-point data in ROOT. |
Isn't there the same problem with DataFrame as it is with Dictionary? Right now pandas DataFrame is transformed into dictionary and then interpreted as TTree. uproot5/src/uproot/writing/identify.py Lines 53 to 59 in 0a997cd
If I understand correctly you suggest to add logic, so whenever there are columns in DataFrame with given names {x, y, errors_x, etc... } , then recognize DataFrame as TGraph (or TGraphErrors, TGraphAsymmErrors)?
|
We could catch the DataFrame type before asking if it's a Mapping. This would take away some of the types that are currently being identified as TTrees, but accidentally. There needs to be simple, easy-to-remember criteria for what will become a TTree and what will become a TGraph*, and DataFrame vs other Mapping could be that criteria. Presence of a special attribute is more subtle. |
I think that always converting DataFrame into TGraph is a good idea, because usually DataFrame do not have entries of variable size in column (at least in pandas), so they could be rightly interpreted as [x,y] (+errors) coordinates. However wouldn't that change break people existing code that transforms pandas DataFrame into TTree? I would still propose the flag |
I had forgotten that DataFrame was a major, recommended way to make TTrees (not just an accident of DataFrames being mappings). Okay. But still, setting a flag like What about this? f = uproot.recreate("/tmp/stuff.root")
df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
f["ttree"] = df
f["tgraph"] = uproot.as_TGraph(df) Then the user only needs to remember a function named This function could just add the flag, and a DataFrame/namedtuple with that flag is recognized as a TGraph. So it's possible to do both. |
Okay there is slight misconception with the flag. What i mean was something like this: implicit_DataFrame_TGraph_convertion = True
def set_implicit_DataFrame_TGraph_convertion(value: bool):
if not isinstance(value, bool):
raise ValueError("Flag must be boolean")
global implicit_DataFrame_TGraph_convertion
implicit_DataFrame_TGraph_convertion = value and then in the add_to_directory function if implicit_DataFrame_TGraph_convertion and uproot._util.is_dataframe(obj):
if uproot._util.from_module(obj, "pandas"):
import pandas
[...] So it is up to the user what behaviour he wants, implicit DataFrame conversion to TGraph or not. But let's move on. Now I will refer to my 2. proposition point. f = uproot.recreate("/tmp/stuff.root")
df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
f["ttree"] = df
f["tgraph"] = uproot.as_TGraph(df)
On a second thought, there is something like this uproot5/src/uproot/models/TGraph.py Lines 29 to 344 in 0a997cd
So |
I didn't realize that you were suggesting a global flag, but we shouldn't do that for multiple reasons. (What if the user wants to write TTrees and TGraphs? What if the user is not using Uproot directly, but through a library and that library changes this state? What if the user is using a library of a library, and one level does it and another doesn't know that it has happened?) But having |
Hey @jpivarski! I implemented |
To simplify the git topology, please open a PR for inclusion into Is your branch derived from this branch? If so, then I can close this PR in favor of yours and @grzanka's commits should still be in it, so that the final PR will count as being co-authored by the two of you. |
Yes I made branch derived from this one |
This is a draft. It needs TGraphErrors, TGraphAsymmErrors, tests of writing to actual files and reading them back with ROOT, and maybe some high-level interface so that we can
for a Python object XYZ that would be interpreted as a TGraph (or TGraphErrors, or TGraphAsymmErrors). What should that XYZ be? Not raw NumPy arrays, since a tuple of NumPy arrays is already interpreted as a histogram. Does it need to be some kind of
uproot.make_TGraph
function? It would be nice to have something more Pythonic.Should any Pandas DataFrame with columns named
x
andy
be interpreted as a TGraph? A Pandas DataFrame would supply a title.Thoughts, @grzanka?