Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate using awkwardarray for variable length arrays #223

Open
maxnoe opened this issue Jan 13, 2021 · 4 comments
Open

Evaluate using awkwardarray for variable length arrays #223

maxnoe opened this issue Jan 13, 2021 · 4 comments

Comments

@maxnoe
Copy link
Member

maxnoe commented Jan 13, 2021

We should evaluate if we can use awkward to speed things up where we currently use list of lists or arrays of arrays where we have variable length data.

https://awkward-array.org/quickstart.html

@kosack
Copy link

kosack commented Jan 13, 2021

I tried some experiments using it in ctapipe a year ago or so, but decided in the end it would require too much redesign. Although it depends where and how heavily we would use it. It has some implications (and benefits) also for data format (e.g. ability to write these var-length arrays to parquet).

It's interesting technology though, and could help solve the problem of event-wise vs bunch-of-event processing. Ideally we would like to support the latter and do away with all explicit loops over events for efficiency purposes, but the current design makes that difficult. My main issue with awkward was just that it was not very stable, but now that there is a 1.x release, that is encouraging.

Places where it could be interesting to explore using it would be:

  • internal representation of merged telescope event data (avoid loops over telescopes, perhaps)
  • storing sparse waveforms and images (data volume reduced)
  • replace event structure entirely, allowing algorithms potentially to be easily applied at to bunches of events (no event loop)
  • instrument model

@maxnoe
Copy link
Member Author

maxnoe commented Jan 13, 2021

Here I was mainly talking about the places where simtel array uses variable length data and that have a quite large performance impact when reading eventio.

These are at the moment:

  • pixel sector information eventio/simtel/parsing.pyx, which we currently parse into a list[array.array('h')]
  • Zero suppressed adc samples (currently no files are produced like this)

@kosack
Copy link

kosack commented Jan 13, 2021

Perhaps a similar issue should be opened for pyeventio, since there it's clearly useful.

@maxnoe
Copy link
Member Author

maxnoe commented Jan 13, 2021

Sorry, I misclicked. I intended this to be an eventio issue.

@maxnoe maxnoe transferred this issue from cta-observatory/ctapipe Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants