LambdaTransformer (Scaper compatibility) #127

beasteers · 2019-10-08T18:35:15Z

What does this implement/fix? Explain your changes.

This adds a general purpose transformer that can be used to load and transform arbitrary observations with Pump.

I built this for the purpose of extracting Scaper annotations, but it's not Scaper specific.

Here's a super simple example:

trans = pumpp.task.LambdaTransformer(
        name='scaper', namespace='scaper',
        fields=['snr', 'label'], 
        query={'role': 'foreground'})

Assuming a 5 second sample with a single 1 second event 2 seconds in, the data dict would look like this:

{
    'scaper/snr': [np.nan, np.nan, 10, np.nan, np.nan],
    'scaper/label': ['', '', 'something', '', '']
}

Filtering Observations

query can be used flexibly with a wide range of values. The type of query should roughly match the observation values and is run recursively through dicts and lists. So if the observation is a dict and you want to query based on keys, you build it as a dict with keys matching value. If value is a list and you want to condition element-wise, then make query a list. If value is a single string then use a string. You can also use a set to check membership for hashable types.

At any point, you can set it as a callable and it will pass the data up to that point.

It will will fail if any conditions are False.

Here are some valid query examples:

query={'label': 'AAA|BBB'} # either AAA or BBB
query={'label': lambda label: 'A' in label, 'role': 'foreground'} # arbitrary condition
query={'pitch_shift': lambda pitch: pitch and pitch > 3} # whatever you want
query={'pitch_shift': {1, 2}} # pitch_shift is in set
query=lambda d: d['pitch_shift'] == -3 or 'Thunk' in d['label'] # callable gets the full dict

# Or say observation values are just strings
query=lambda label: 'Dog' in label
query={'Dog bark', 'Hum', 'Honk'} # label is in set
query='[^0-9]+'

# Or a list field of shape (None, 4,)
query=[5, 6, lambda x: x // 2 < 25, {8, 9, 10}] # mix and match

Aggregating interval windows

And you can have a bit more control. reduce(x) is iteratively fed a list of all the events within each hop window interval.

trans = pumpp.task.LambdaTransformer(
        name='scaper', namespace='scaper',
        fields=[
            'label', # from schema
            ('mean_time_stretch', (None,1), np.float_), # custom field
            ('all_time_stretch', (None,None), np.float_), # variable number of events
        ], 
        query={'role': 'foreground'},
        multi=True, reduce=lambda events: {
            'label': ','.join(set(e['label'] for e in events)),
            'mean_time_stretch': np.mean([e['time_stretch'] for e in events]),
            'all_time_stretch': [e['time_stretch'] for e in events],
        })

real life

And finally, here's how I'm currently using it:

pumpp.task.LambdaTransformer(
    'scaper', 'scaper',
    ['snr', 'label', 'source_file', ('fault', (None, 1), np.bool_)],
    query={'label': fault_label},
    reduce=lambda e: dict(e or {}, fault=e and ~np.isnan(e['snr']))
)

Any other comments?

As of right now, the all_time_stretch field won't work with a slicer because all None fields are interpreted as a time dimension. I see how this makes sense for the structure transformer. I'm not sure how to reconcile it with returning array values. Maybe it's really not necessary ever, but part of my thinks it would be a nice option to have (returning an array for each interval) if we want to support as many use cases as possible.

This could also probably use some more safeguards preventing ppl from doing bad things, but atm I'm not sure what those would be so for now, I think it's okay to leave things open ended.

beasteers · 2019-10-08T18:40:32Z

Is there anything else in this PR that needs high-level commentary before i dig in for a proper CR?

I don't think so? Let me know if things need clarification

Also fixed typo and small bug in base.py

FInished all but 1 unit tests, and ready to code

This reverts commit 56cb062.

beasteers · 2019-10-08T21:06:49Z

I divided it into separate commits after getting low-key shamed during the marl meeting. 😝

(Justin has told me that I should squash commits when contributing. my bad)

beasteers · 2019-10-09T14:35:12Z

One thing I want to add is a static parameter which will return a single value for the entire annotation. This would be useful to extract the background source_file for example

beasteers · 2019-10-14T22:30:52Z

It'd also be good to be able to gather arbitrary sandbox data as well. I'm not sure if this fits in the scope of this transformer or if it'd be better to create a simpler, dedicated transformer for that purpose.

codecov-commenter · 2022-04-14T13:36:41Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.71%. Comparing base (4a67bdf) to head (c588f58).
Report is 5 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #127      +/-   ##
==========================================
+ Coverage   99.69%   99.71%   +0.02%     
==========================================
  Files          22       24       +2     
  Lines        1299     1425     +126     
==========================================
+ Hits         1295     1421     +126     
  Misses          4        4

Flag	Coverage Δ
unittests	`99.71% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tomxi and others added 19 commits October 8, 2019 16:35

started some skeletons for task.KeyTransformer

5586cc4

Also fixed typo and small bug in base.py

Fixed a bug in task/base.py introduced by me

1fa187b

FInished all but 1 unit tests, and ready to code

Finished _encode_key_str and passed test

2425f26

commit ready to submit for PR

f233494

fixed some small errors after CR by bmcfee

7a54c0c

incorporated suggestions from bmcfee

a4deec5

Increased test coverage

6184621

Implemented KeyTagTransformer and tests

cef94da

increased test coverage

5e4d0a6

Added the leading tone to minor key profiles

a8f0bfc

post CR by @bmcfee

1563690

added lambda transformer

56cb062

Revert "added lambda transformer"

58b64e1

This reverts commit 56cb062.

added task defaults

5a62aa0

added lambda transformer

bcf2bae

added import in __init__

024d018

added query util

e352b2f

added match_query test

68f923a

added lambda transformer tests

1a7460f

beasteers force-pushed the lambdatrans branch from ecaaa67 to 1a7460f Compare October 8, 2019 21:00

Merge branch 'master' into lambdatrans

efc3658

bmcfee added this to the 0.6.0 milestone Apr 14, 2022

Merge branch 'main' into lambdatrans

c588f58

bmcfee removed this from the 0.6.0 milestone Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LambdaTransformer (Scaper compatibility) #127

LambdaTransformer (Scaper compatibility) #127

beasteers commented Oct 8, 2019

beasteers commented Oct 8, 2019

beasteers commented Oct 8, 2019 •

edited

Loading

beasteers commented Oct 9, 2019

beasteers commented Oct 14, 2019

codecov-commenter commented Apr 14, 2022 •

edited

Loading

LambdaTransformer (Scaper compatibility) #127

Are you sure you want to change the base?

LambdaTransformer (Scaper compatibility) #127

Conversation

beasteers commented Oct 8, 2019

What does this implement/fix? Explain your changes.

Filtering Observations

Aggregating interval windows

real life

Any other comments?

beasteers commented Oct 8, 2019

beasteers commented Oct 8, 2019 • edited Loading

beasteers commented Oct 9, 2019

beasteers commented Oct 14, 2019

codecov-commenter commented Apr 14, 2022 • edited Loading

Codecov Report

beasteers commented Oct 8, 2019 •

edited

Loading

codecov-commenter commented Apr 14, 2022 •

edited

Loading