-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alternate AnnotationData backend #149
Conversation
Okay, this PR is now functional in that all tests pass again. So if you never touched a The new implementation replaces the Under the hood, This namedtuple implementation also makes individual observations immutable, which I see as a Good Thing. If you want to modify observations, it's best to make a new annotation object and re-insert them after modification. Note that this is the reverse of how the You can iterate over observations in the usual way: for obs in ann.data:
print(obs.time, obs.duration, obs.value, obs.confidence) @justinsalamon and I kicked around the idea of having an Pretty-printing is the last major hurdle, but I think we can get over that fairly easily. |
jams/core.py
Outdated
|
||
Fields must be `['time', 'duration', 'value', 'confidence']`. | ||
class AnnotationData(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honest question: does inheriting directly from SortedListWithKey
buy us anything? This object owning an obs
object feels like an unnecessary abstraction, but I'm not sure if this would result in a weird interface.
thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I'm thinking it's a bad idea to extend from other peoples' classes because it makes the API encapsulation boundary porous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough ... I guess JamsFrame inherited from DataFrame, but maybe that was half the issue?
if we're not going to inherit from it, I'd propose making obs some level of private (dunders?) so that users are discouraged from using it as part of the API. However, I don't like having to wrap a bunch of pre-existing functionality ... SortedListWithKey is quite fully loaded.
Also worth mentioning here: What gets added to an AnnotationData object? Observation
s or the fields of an observation? or, said differently, who is responsible for creating the Observation object, the user or the AnnotationData object? because, if it's the former, I think that's an argument for inheritance.
related pop quiz: in Jams, who creates an Annotation when adding it to a JAMS object, the user or the owning object? (because we should follow that paradigm for consistency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're not going to inherit from it, I'd propose making obs some level of private (dunders?) so that users are discouraged from using it as part of the API.
Yeah, I'd be down with that. I'd say sunder not dunder, ie data._obs
.
Also worth mentioning here: What gets added to an AnnotationData object? Observations or the fields of an observation? or, said differently, who is responsible for creating the Observation object, the user or the AnnotationData object? because, if it's the former, I think that's an argument for inheritance.
It depends: if you're adding a bunch of records at once, it's typically in deserialized form (ie after a json parse), so it'd be list-of-dict/dict-of-list in.
The core add routine takes the fields for a single observation record and packs them in an Observation
for adding.
I don't see what this has to do with inheritance though?
related pop quiz: in Jams, who creates an Annotation when adding it to a JAMS object, the user or the owning object? (because we should follow that paradigm for consistency)
Should we? I see the AnnotationData
class as relatively private, because it diverges somewhat from the schema. Annotation
provides an interface for adding observations, and that's all a user should need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this:
Another option is to drop AnnotationData
entirely, keep ann.data
as a sorted container object, and bubble up the rest of the functionality for adding observations, iterations, etc to Annotation
. This would be a more radical change in the API than I'd originally intended, but it seems like maybe a good idea provided we're willing to break stuff anyway.
jams/core.py
Outdated
return '<{}: {:d} observations>'.format(self.__class__.__name__, | ||
len(self)) | ||
|
||
def __iter__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, we wouldn't need to implement this method, or L601
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or L604, i think?
jams/core.py
Outdated
frame = frame[cls.fields()] | ||
def add_observation(self, time=None, duration=None, value=None, | ||
confidence=None): | ||
idx = self.obs.bisect_key(time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if nothing else, I think we can use SortedListWithKey
directly (docs). I poked around, and calling add
sorts items in-line, so self.obs.add(Observation(...))
should work.
additionally, i think if AnnotationData is a sublcass of SortedListWithKey
, then we might subclass add
directly? and maybe call it ObservationArray
? idk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that. Maybe I did it wrong, but it doesn't allow you to insert items in a way that violates sort order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jams/core.py
Outdated
value=values, | ||
confidence=confidences) | ||
else: | ||
return [dict(time=o.time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Observation
is an object rather than a NamedTuple, might we get some of this for free?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of what? I mean, we can always do dict(obs._asdict())
, but that doesn't solve the dict-of-lists/list-of-dicts problem resulting from sparse/dense observation lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"some of what": the serialization waltz, but yea, sparse vs dense makes this tricky.
jams/core.py
Outdated
@@ -1016,10 +892,10 @@ def trim(self, start_time, end_time, strict=False): | |||
# We do this rather than copying and directly manipulating the | |||
# annotation' data frame (which might be faster) since this way trim is | |||
# independent of the internal data representation. | |||
for idx, obs in self.data.iterrows(): | |||
for obs in self.data.obs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be keen to simplify this and make the object directly iterable, by a subclass or whatever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait, it is. I'd remove .obs
to dig in the preferred interaction paradigm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about Annotation
or AnnotationData
?
The latter is iterable (wrapping the sorted list); the former is not and wasn't previously. We could make it iterable, but that sounds like a change better suited to a separate PR.
jams/nsconvert.py
Outdated
@@ -137,12 +140,38 @@ def can_convert(annotation, target_namespace): | |||
return False | |||
|
|||
|
|||
def pop_data(annotation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this should be a method of AnnotationData
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just had that thought too. Will migrate in a bit.
well, first off, 💯 for a PR / RFC that deletes more code than it adds. I do actually like that this gets us away from pandas (maybe as an optional dependency?), but we only trade dependencies. Is the development velocity on question: didn't pandas provide some neat wins for muda? shifting annotations / observations in time and whatnot? |
I think we still need pandas as a dependency for loading lab/csv files (utils, import scripts, etc). But it's good to get the hacky dataframe dependencies out from under the jams object hierarchy. Sortedcontainers is a much simpler project, and seems reasonably mature at this point.
It did, but that's not a huge deal. It's better IMO to make observations immutable and put a little more friction into annotation manipulation. Muda will need a little retrofitting, but it's probably about an afternoon's worth of work. |
Note: coverage is down here because I added html rendering for |
Would it be a good/bad idea, since we still have the pandas dependency anyhow, to do viz by exporting to a dataframe and then using the pandas built in viz? This could be nice in that you could use the variety of viz options offered by the pandas API for dataframes (e.g. peak(), head(), etc.) |
I don't think we should rely on it, but it's always an option. The dataframe converter is done anyway, so you can do df = ann.data.to_dataframe()
df.head(10) etc. |
Amending my previous statement: I don't think we can consistently make |
@ejhumphrey I think this one's stable for eyeballs again. Coverage drop is primarily due to html rendering not being covered, but I'd prefer to push that to #93 and implement it across the board. |
@@ -159,7 +165,13 @@ def note_midi_to_hz(annotation): | |||
'''Convert a pitch_midi annotation to pitch_hz''' | |||
|
|||
annotation.namespace = 'note_hz' | |||
annotation.data.value = 440 * (2.0 ** ((annotation.data.value - 69.0)/12.0)) | |||
data = annotation.pop_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can neither articulate why nor propose something "better", but the semantics of this feel weird for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you object to the operation itself, or just the naming?
'value': None, | ||
'confdence': None} | ||
# Bypass the safety checks in append | ||
ann.data.add(Observation(time=data['time'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks so much nicer
Couple thoughts, minor apologies if / when relevance is a stretch:
|
I'd prefer to keep the strict coverage check in place, just so that we're aware of it at all times.
Gone outright: any previous access to the The for idx, obs in enumerate(ann):
... though there's almost never a good reason to want the index itself. |
Hashed out in person with @ejhumphrey -- this one's good to go. |
This is work in progress, but implements an alternate backend for annotation data.
TODO: