Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binned dataframes trouble with branches names with full stops #134

Open
asnaylor opened this issue Aug 19, 2020 · 0 comments
Open

Binned dataframes trouble with branches names with full stops #134

asnaylor opened this issue Aug 19, 2020 · 0 comments

Comments

@asnaylor
Copy link
Collaborator

Unable to bin data from branches with full stop in the name without renaming branch. Related to previous issues #95 #54. Solving this would allow a temporary solution to #132 as the previous method of using fast_carpenter.Define to change the variable name to one without a full stop doesn’t work for variables of type ULong_t.

versions:

python==3.8.5
fast-carpenter==0.18.2
pandas==1.1.0
numpy==1.19.1
numexpr==2.7.1
coffea==0.6.42

When attempting to bin a vector<int> with this config results in no errors but an empty csv outputfile from fast_carpenter:

stages:
    - output: fast_carpenter.BinnedDataframe
 
output:
    binning:
        - {in: pulsesTPC.peakTime_ns} #vector<int>

When combining the previous workaround method using fast_carpenter.Define to the config, it results in a KeyError :

stages:
    - define_vars: fast_carpenter.Define
    - output: fast_carpenter.BinnedDataframe
 
define_vars:
    variables:
        - peak_time_ns: pulsesTPC.peakTime_ns 

output:
    binning:
        - {in: pulsesTPC.peakTime_ns} #vector<int>
        - {in: peak_time_ns}

Full Traceback:

concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 139, in __call__
    out = self.function(*args, **kwargs)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 833, in _work_function
    raise e
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 794, in _work_function
    out = processor_instance.process(df)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/backends/coffea.py", line 64, in process
    work.event(chunk)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/summary/binned_dataframe.py", line 217, in event
    binned_values = _bin_values(data, dimensions=self._bin_dims,
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/summary/binned_dataframe.py", line 267, in _bin_values
    bins = data.groupby(final_bin_dims, observed=observed)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/pandas/core/frame.py", line 6504, in groupby
    return DataFrameGroupBy(
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 525, in __init__
    grouper, exclusions, obj = get_grouper(
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 777, in get_grouper
    raise KeyError(gpr)
KeyError: 'pulsesTPC.peakTime_ns'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/bin/fast_carpenter", line 8, in <module>
    sys.exit(main())
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/__main__.py", line 64, in main
    results, _ = backend.execute(sequence, datasets, args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/backends/coffea.py", line 100, in execute
    out = run_uproot_job(coffea_datasets, 'events', fp, executor, executor_args=exe_args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 1068, in run_uproot_job
    executor(chunks, closure, wrapped_out, **exe_args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 567, in futures_executor
    _futures_handler(futures, accumulator, status, unit, desc, add_fn, tailtimeout)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 197, in _futures_handler
    add_fn(output, finished.pop().result())
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
KeyError: 'pulsesTPC.peakTime_ns'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant