-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LambdaTransformer (Scaper compatibility) #127
base: main
Are you sure you want to change the base?
Conversation
I don't think so? Let me know if things need clarification |
Also fixed typo and small bug in base.py
FInished all but 1 unit tests, and ready to code
This reverts commit 56cb062.
I divided it into separate commits after getting low-key shamed during the marl meeting. 😝 (Justin has told me that I should squash commits when contributing. my bad) |
One thing I want to add is a |
It'd also be good to be able to gather arbitrary sandbox data as well. I'm not sure if this fits in the scope of this transformer or if it'd be better to create a simpler, dedicated transformer for that purpose. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #127 +/- ##
==========================================
+ Coverage 99.69% 99.71% +0.02%
==========================================
Files 22 24 +2
Lines 1299 1425 +126
==========================================
+ Hits 1295 1421 +126
Misses 4 4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
What does this implement/fix? Explain your changes.
This adds a general purpose transformer that can be used to load and transform arbitrary observations with Pump.
I built this for the purpose of extracting Scaper annotations, but it's not Scaper specific.
Here's a super simple example:
Assuming a 5 second sample with a single 1 second event 2 seconds in, the data dict would look like this:
Filtering Observations
query
can be used flexibly with a wide range of values. The type of query should roughly match the observation values and is run recursively through dicts and lists. So if the observation is a dict and you want to query based on keys, you build it as a dict with keys matchingvalue
. Ifvalue
is a list and you want to condition element-wise, then makequery
a list. Ifvalue
is a single string then use a string. You can also use a set to check membership for hashable types.At any point, you can set it as a callable and it will pass the data up to that point.
It will will fail if any conditions are False.
Here are some valid query examples:
Aggregating interval windows
And you can have a bit more control.
reduce(x)
is iteratively fed a list of all the events within each hop window interval.real life
And finally, here's how I'm currently using it:
Any other comments?
As of right now, the
all_time_stretch
field won't work with a slicer because allNone
fields are interpreted as a time dimension. I see how this makes sense for thestructure
transformer. I'm not sure how to reconcile it with returning array values. Maybe it's really not necessary ever, but part of my thinks it would be a nice option to have (returning an array for each interval) if we want to support as many use cases as possible.This could also probably use some more safeguards preventing ppl from doing bad things, but atm I'm not sure what those would be so for now, I think it's okay to leave things open ended.