alternate AnnotationData backend #149

bmcfee · 2017-05-09T21:29:47Z

This is work in progress, but implements an alternate backend for annotation data.

TODO:

trim
slice
deep serialization (eg for vector format)
pretty printing
purge all pandas references
settle on final object hierarchy / naming conventions
update converter scripts and documentation / examples

bmcfee · 2017-05-10T18:28:42Z

Okay, this PR is now functional in that all tests pass again. So if you never touched a JamsFrame object directly in application code (and there are few reasons to do so), then everything should just work.
That said, the design is up for discussion though, particularly seeking input from @justinsalamon and @ejhumphrey.

The new implementation replaces the JamsFrame object by an AnnotationData object, which at the schema level covers both DenseObservation and SparseObservationList, though not using JObjects because I don't think we should have sandbox support in the observation list.

Under the hood, AnnotationData encapsulates the observations with an ordered (by time) list of sparse observations. Observations are implemented as named tuples (time, duration, value, confidence), again not using JObject so that we can preserve slots and field ordering.

This namedtuple implementation also makes individual observations immutable, which I see as a Good Thing. If you want to modify observations, it's best to make a new annotation object and re-insert them after modification. Note that this is the reverse of how the DataFrame implementation worked: the individual observations there were mutable (it's easy to modify entries in-place), but the container itself was much more rigid (adding and deleting observations was difficult). I think the mutable-container/immutable-observation model makes a lot more sense for the common cases (e.g., adding observations to an existing annotation).

You can iterate over observations in the usual way:

for obs in ann.data:
    print(obs.time, obs.duration, obs.value, obs.confidence)

@justinsalamon and I kicked around the idea of having an export-to-dataframe method, which should be pretty trivial to cook up. Similarly, we might want to consider adding other mir_eval-style converters, such as to_event_values for instantaneous observations.

Pretty-printing is the last major hurdle, but I think we can get over that fairly easily.

ejhumphrey · 2017-05-10T18:37:54Z

jams/core.py


-            Fields must be `['time', 'duration', 'value', 'confidence']`.
+class AnnotationData(object):


honest question: does inheriting directly from SortedListWithKey buy us anything? This object owning an obs object feels like an unnecessary abstraction, but I'm not sure if this would result in a weird interface.

thoughts?

At this point, I'm thinking it's a bad idea to extend from other peoples' classes because it makes the API encapsulation boundary porous.

fair enough ... I guess JamsFrame inherited from DataFrame, but maybe that was half the issue?

if we're not going to inherit from it, I'd propose making obs some level of private (dunders?) so that users are discouraged from using it as part of the API. However, I don't like having to wrap a bunch of pre-existing functionality ... SortedListWithKey is quite fully loaded.

Also worth mentioning here: What gets added to an AnnotationData object? Observations or the fields of an observation? or, said differently, who is responsible for creating the Observation object, the user or the AnnotationData object? because, if it's the former, I think that's an argument for inheritance.

related pop quiz: in Jams, who creates an Annotation when adding it to a JAMS object, the user or the owning object? (because we should follow that paradigm for consistency)

if we're not going to inherit from it, I'd propose making obs some level of private (dunders?) so that users are discouraged from using it as part of the API.

Yeah, I'd be down with that. I'd say sunder not dunder, ie data._obs.

Also worth mentioning here: What gets added to an AnnotationData object? Observations or the fields of an observation? or, said differently, who is responsible for creating the Observation object, the user or the AnnotationData object? because, if it's the former, I think that's an argument for inheritance.

It depends: if you're adding a bunch of records at once, it's typically in deserialized form (ie after a json parse), so it'd be list-of-dict/dict-of-list in.

The core add routine takes the fields for a single observation record and packs them in an Observation for adding.

I don't see what this has to do with inheritance though?

related pop quiz: in Jams, who creates an Annotation when adding it to a JAMS object, the user or the owning object? (because we should follow that paradigm for consistency)

Should we? I see the AnnotationData class as relatively private, because it diverges somewhat from the schema. Annotation provides an interface for adding observations, and that's all a user should need.

Thinking more about this:

Another option is to drop AnnotationData entirely, keep ann.data as a sorted container object, and bubble up the rest of the functionality for adding observations, iterations, etc to Annotation. This would be a more radical change in the API than I'd originally intended, but it seems like maybe a good idea provided we're willing to break stuff anyway.

ejhumphrey · 2017-05-10T18:38:33Z

jams/core.py

+        return '<{}: {:d} observations>'.format(self.__class__.__name__,
+                                                len(self))
+
+    def __iter__(self):


for example, we wouldn't need to implement this method, or L601

or L604, i think?

ejhumphrey · 2017-05-10T18:47:26Z

jams/core.py

-        frame = frame[cls.fields()]
+    def add_observation(self, time=None, duration=None, value=None,
+                        confidence=None):
+        idx = self.obs.bisect_key(time)


if nothing else, I think we can use SortedListWithKey directly (docs). I poked around, and calling add sorts items in-line, so self.obs.add(Observation(...)) should work.

additionally, i think if AnnotationData is a sublcass of SortedListWithKey, then we might subclass add directly? and maybe call it ObservationArray? idk?

I tried that. Maybe I did it wrong, but it doesn't allow you to insert items in a way that violates sort order.

ya, you can't insert or append if it violates order, but add should sort.

ejhumphrey · 2017-05-10T18:48:46Z

jams/core.py

+                        value=values,
+                        confidence=confidences)
+        else:
+            return [dict(time=o.time,


if Observation is an object rather than a NamedTuple, might we get some of this for free?

Some of what? I mean, we can always do dict(obs._asdict()), but that doesn't solve the dict-of-lists/list-of-dicts problem resulting from sparse/dense observation lists.

"some of what": the serialization waltz, but yea, sparse vs dense makes this tricky.

ejhumphrey · 2017-05-10T18:49:53Z

jams/core.py

@@ -1016,10 +892,10 @@ def trim(self, start_time, end_time, strict=False):
        # We do this rather than copying and directly manipulating the
        # annotation' data frame (which might be faster) since this way trim is
        # independent of the internal data representation.
-        for idx, obs in self.data.iterrows():
+        for obs in self.data.obs:


I would be keen to simplify this and make the object directly iterable, by a subclass or whatever

oh wait, it is. I'd remove .obs to dig in the preferred interaction paradigm

Are you talking about Annotation or AnnotationData?

The latter is iterable (wrapping the sorted list); the former is not and wasn't previously. We could make it iterable, but that sounds like a change better suited to a separate PR.

ejhumphrey · 2017-05-10T18:52:18Z

jams/nsconvert.py

@@ -137,12 +140,38 @@ def can_convert(annotation, target_namespace):
    return False


+def pop_data(annotation):


feels like this should be a method of AnnotationData?

Yeah, I just had that thought too. Will migrate in a bit.

ejhumphrey · 2017-05-10T18:55:42Z

well, first off, 💯 for a PR / RFC that deletes more code than it adds.

I do actually like that this gets us away from pandas (maybe as an optional dependency?), but we only trade dependencies. Is the development velocity on sortedcontainers lower than pandas, and that's why it's a win?

question: didn't pandas provide some neat wins for muda? shifting annotations / observations in time and whatnot?

bmcfee · 2017-05-10T19:04:40Z

I do actually like that this gets us away from pandas (maybe as an optional dependency?), but we only trade dependencies. Is the development velocity on sortedcontainers lower than pandas, and that's why it's a win?

I think we still need pandas as a dependency for loading lab/csv files (utils, import scripts, etc). But it's good to get the hacky dataframe dependencies out from under the jams object hierarchy.

Sortedcontainers is a much simpler project, and seems reasonably mature at this point.

question: didn't pandas provide some neat wins for muda? shifting annotations / observations in time and whatnot?

It did, but that's not a huge deal. It's better IMO to make observations immutable and put a little more friction into annotation manipulation. Muda will need a little retrofitting, but it's probably about an afternoon's worth of work.

bmcfee · 2017-05-10T20:43:20Z

Note: coverage is down here because I added html rendering for AnnotationData. I don't think unit tests are critical on that function.

justinsalamon · 2017-05-10T20:52:52Z

Note: coverage is down here because I added html rendering for AnnotationData.

Would it be a good/bad idea, since we still have the pandas dependency anyhow, to do viz by exporting to a dataframe and then using the pandas built in viz? This could be nice in that you could use the variety of viz options offered by the pandas API for dataframes (e.g. peak(), head(), etc.)

bmcfee · 2017-05-10T21:28:05Z

Would it be a good/bad idea, since we still have the pandas dependency anyhow, to do viz by exporting to a dataframe and then using the pandas built in viz?

I don't think we should rely on it, but it's always an option. The dataframe converter is done anyway, so you can do

df = ann.data.to_dataframe()
df.head(10)

etc.

bmcfee · 2017-05-13T14:22:44Z

Amending my previous statement: I don't think we can consistently make Annotation implement the Collection ABC because it would change the semantics inherited from JObject for __len__ and __contains__, which both apply to the object's __dict__. OTOH there's no problem in just implementing __iter__.

bmcfee · 2017-05-15T19:12:23Z

@ejhumphrey I think this one's stable for eyeballs again.

Coverage drop is primarily due to html rendering not being covered, but I'd prefer to push that to #93 and implement it across the board.

ejhumphrey · 2017-05-19T15:52:38Z

jams/nsconvert.py

@@ -159,7 +165,13 @@ def note_midi_to_hz(annotation):
    '''Convert a pitch_midi annotation to pitch_hz'''

    annotation.namespace = 'note_hz'
-    annotation.data.value = 440 * (2.0 ** ((annotation.data.value - 69.0)/12.0))
+    data = annotation.pop_data()


i can neither articulate why nor propose something "better", but the semantics of this feel weird for some reason.

Do you object to the operation itself, or just the naming?

ejhumphrey · 2017-05-19T15:53:34Z

tests/namespace_tests.py

-                           'value': None,
-                           'confdence': None}
+        # Bypass the safety checks in append
+        ann.data.add(Observation(time=data['time'],


this looks so much nicer

ejhumphrey · 2017-05-19T16:00:32Z

Couple thoughts, minor apologies if / when relevance is a stretch:

Not concerned with the coverage drop; does coveralls have a setting that says "don't worry about drops if the total coverage is over a certain level" (say, 98?)
It has since occurred to me that all of my other projects using jams have things like for idx, obs in ann.data.iterrows() ... is this a common pattern (or am I weird); and if it is common, are we throwing deprecation warnings for this usage? or is it gone outright?

bmcfee · 2017-05-19T16:45:43Z

Not concerned with the coverage drop; does coveralls have a setting that says "don't worry about drops if the total coverage is over a certain level" (say, 98?)

I'd prefer to keep the strict coverage check in place, just so that we're aware of it at all times.

It has since occurred to me that all of my other projects using jams have things like for idx, obs in ann.data.iterrows() ... is this a common pattern (or am I weird); and if it is common, are we throwing deprecation warnings for this usage? or is it gone outright?

Gone outright: any previous access to the data field is not guaranteed to work in general. We throw deprecation warnings for some things in 0.2.3, but it was too much of a mess to do it in full generality without also flagging internal access that was going to change anyway.

The idx, obs pattern can be recuvered by

for idx, obs in enumerate(ann):
    ...

though there's almost never a good reason to want the index itself.

bmcfee · 2017-05-19T18:28:20Z

Hashed out in person with @ejhumphrey -- this one's good to go.

bmcfee added 7 commits May 9, 2017 17:28

working on alternate AnnotationData backend

182894e

fixing up tests

d0196f8

ported over nsconvert

67a74f1

fixed slicing

19aeb2a

fixed display and sonification

cc5b4f2

allow importing annotationdata as Observations as well as dicts

e3ec170

purged jamsframe

52a970b

bmcfee changed the title ~~[WIP] alternate AnnotationData backend~~ [RFC] alternate AnnotationData backend May 10, 2017

bmcfee requested review from ejhumphrey and justinsalamon May 10, 2017 18:29

bmcfee added enhancement pyjams labels May 10, 2017

bmcfee added this to the 0.3.0 milestone May 10, 2017

ejhumphrey reviewed May 10, 2017

View reviewed changes

bmcfee added 4 commits May 10, 2017 15:14

moved pop_data to annotation

21f8d7d

simplified add_observation

05c99d2

added dataframe export

1d9be64

docstrings, repr_html

a5efa99

bmcfee mentioned this pull request May 10, 2017

Pretty printing: _repr_html_ and _repr_svg_ #93

Closed

bmcfee added 2 commits May 13, 2017 10:10

added type check to observation container index

1c403ef

removed container check

b499c30

bmcfee added 4 commits May 13, 2017 10:27

removed annotation.__len__ override

7cc38dc

removed nottest from beat tracking eval

aaeb321

added Annotation.to_event_values

fa81d8e

switched pattern eval to event_values

a67f0b3

bmcfee mentioned this pull request May 13, 2017

0.2.3 -> 0.3.0 shims #153

Merged

bmcfee and others added 12 commits May 13, 2017 19:19

Merge branch 'master' into drop-pandas

92cdaa0

recovering from merge weirdness

8e49244

getting test coverage up

039fce0

getting test coverage up

05f6460

fixed a type output error in to_interval_values

f852a4a

linting

61147ac

linting and style

6aa4460

removed timedelta_to_float

b965a31

removed dangling entry from util docstring index

cf81444

upgrading docs

c483fcc

cleaning up docstring examples

55c2cd5

more cleaning of docs

f41f717

bmcfee changed the title ~~[RFC] alternate AnnotationData backend~~ [CR needed] alternate AnnotationData backend May 17, 2017

ejhumphrey reviewed May 19, 2017

View reviewed changes

ejhumphrey approved these changes May 19, 2017

View reviewed changes

bmcfee merged commit 237d7c1 into master May 19, 2017

bmcfee changed the title ~~[CR needed] alternate AnnotationData backend~~ alternate AnnotationData backend May 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alternate AnnotationData backend #149

alternate AnnotationData backend #149

bmcfee commented May 9, 2017 •

edited

Loading

bmcfee commented May 10, 2017

ejhumphrey May 10, 2017

bmcfee May 10, 2017

ejhumphrey May 10, 2017

bmcfee May 10, 2017

bmcfee May 11, 2017

ejhumphrey May 10, 2017

ejhumphrey May 10, 2017

ejhumphrey May 10, 2017

bmcfee May 10, 2017

ejhumphrey May 10, 2017

ejhumphrey May 10, 2017

bmcfee May 10, 2017

ejhumphrey May 10, 2017

ejhumphrey May 10, 2017

ejhumphrey May 10, 2017

bmcfee May 10, 2017

ejhumphrey May 10, 2017

bmcfee May 10, 2017

ejhumphrey commented May 10, 2017

bmcfee commented May 10, 2017

bmcfee commented May 10, 2017

justinsalamon commented May 10, 2017

bmcfee commented May 10, 2017

bmcfee commented May 13, 2017 •

edited

Loading

bmcfee commented May 15, 2017

ejhumphrey May 19, 2017

bmcfee May 19, 2017

ejhumphrey May 19, 2017

ejhumphrey commented May 19, 2017

bmcfee commented May 19, 2017

bmcfee commented May 19, 2017


		Fields must be `['time', 'duration', 'value', 'confidence']`.
		class AnnotationData(object):

		@@ -137,12 +140,38 @@ def can_convert(annotation, target_namespace):
		return False


		def pop_data(annotation):

alternate AnnotationData backend #149

alternate AnnotationData backend #149

Conversation

bmcfee commented May 9, 2017 • edited Loading

bmcfee commented May 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ejhumphrey commented May 10, 2017

bmcfee commented May 10, 2017

bmcfee commented May 10, 2017

justinsalamon commented May 10, 2017

bmcfee commented May 10, 2017

bmcfee commented May 13, 2017 • edited Loading

bmcfee commented May 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ejhumphrey commented May 19, 2017

bmcfee commented May 19, 2017

bmcfee commented May 19, 2017

bmcfee commented May 9, 2017 •

edited

Loading

bmcfee commented May 13, 2017 •

edited

Loading