Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema refactor #92

Open
bmcfee opened this issue Dec 14, 2015 · 5 comments
Open

Schema refactor #92

bmcfee opened this issue Dec 14, 2015 · 5 comments
Assignees
Labels
enhancement schema Issues pertaining to schema definitions
Milestone

Comments

@bmcfee
Copy link
Contributor

bmcfee commented Dec 14, 2015

Rehashing #40 after a conversation with @ejhumphrey

There are good arguments for splitting the JAMS schema into smaller pieces that can be shared and repurposed. Specifically, a database (eg, a mongodb key-value store) for managing jams collections could be more reasonable structured (and easily searchable) if the database contains individual annotation objects (indexed by track id) rather than full JAMS objects.

I propose that we refactor the jams schema so that annotations can exist independently of the JAMS file format. Of course, the JAMS file format will still use annotation definitions, so there should be no observable difference in the way JAMS files work*; put another way, the API for JAMS files stays the same, and all the changes would be under the hood.

Digging in a bit more, the current schema looks like:

jams_schema
`- JAMS
   `- FileMetadata
   |  `- [more stuff]
   `- Annotations
   |  `- [more stuff]
   `- Sandbox

and the refactored schema might look like:

jams_common
`- Sandbox

jams_annotation
`- Annotations
   `- [more stuff]

jams_metadata
`- FileMetadata
   `- [more stuff]

jams_file
`- JAMS
   `- jams_metadata.FileMetadata
   `- jams_annotation.Annotations
   `- jams_common.Sandbox

What do folks think?

To make this happen, we'd have to get a better handle on json-schema inheritance, but I think it's totally possible.

  • We might have to tweak the schema id's, which might require a slight modification to the spec. Not sure about this yet.
@ejhumphrey
Copy link
Collaborator

More related to this than worth spawning a new issue: I'd like to revisit / upvote a conversation about how identifiers / named entities are referenced in JAMS. For example, I'd like to tag a single annotation as being produced by some unique identifier, such that I can search a collection for all annotations performed by the same entity (human or algorithm). We've got the annotator dict, but it's a little too unconstrained to encourage any convention.

@bmcfee
Copy link
Contributor Author

bmcfee commented Aug 18, 2016

I'm not sure that fits under the scope of JAMS per se; remember the headaches about filenames in #5? We eventually decided that that's better handled at the application level -- for better or worse. I suspect that indexing annotation sources will have similar difficulties.

OTOH, if we do want to add support for foreign-key indexing (for tracks, annotators, etc), maybe it's worth reopening that discussion?

@urinieto
Copy link
Contributor

Could we simply add a new identifier field in the annotator dictionary that
is basically a unique hash produced by the annotator name, email,
affiliation, etc?

On Thu, Aug 18, 2016 at 9:03 AM, Brian McFee [email protected]
wrote:

I'm not sure that fits under the scope of JAMS per se; remember the
headaches about filenames in #5 #5?
We eventually decided that that's better handled at the application level
-- for better or worse. I suspect that indexing annotation sources will
have similar difficulties.

OTOH, if we do want to add support for foreign-key indexing (for tracks,
annotators, etc), maybe it's worth reopening that discussion?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#92 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADhisZBuTv6usFCrjzvjP9YFyMB1CaEqks5qhIJfgaJpZM4G004O
.

@ejhumphrey
Copy link
Collaborator

I don't want to necessarily tell users what the namespace should be, but I
think we could benefit from some standardization.
On Aug 18, 2016 12:24, "Oriol Nieto" [email protected] wrote:

Could we simply add a new identifier field in the annotator dictionary that
is basically a unique hash produced by the annotator name, email,
affiliation, etc?

On Thu, Aug 18, 2016 at 9:03 AM, Brian McFee [email protected]
wrote:

I'm not sure that fits under the scope of JAMS per se; remember the
headaches about filenames in #5 #5?
We eventually decided that that's better handled at the application level
-- for better or worse. I suspect that indexing annotation sources will
have similar difficulties.

OTOH, if we do want to add support for foreign-key indexing (for tracks,
annotators, etc), maybe it's worth reopening that discussion?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#92 (comment), or mute
the thread
<https://github.com/notifications/unsubscribe-auth/
ADhisZBuTv6usFCrjzvjP9YFyMB1CaEqks5qhIJfgaJpZM4G004O>
.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#92 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA4iq-7ogkaDZh5FztT8OuNA18mVUdxhks5qhIclgaJpZM4G004O
.

@bmcfee
Copy link
Contributor Author

bmcfee commented Aug 18, 2016

Maybe go rosetta-style? Let identifiers be a list of strings of the form id_space:id_string?

That will at least validate for syntax. If you want semantic validation, that's up to a separate indexing structure that should live outside of jams.

For example, the SALAMI annotators could be identified by salami:0001 or somesuch. Similarly for annotation tools (org:software:version -> qmul:sonic-visualiser:1.2, qmul:tony:2.0, jku:madmom:0.14.1, etc), and filenames could just be standard urls.

@bmcfee bmcfee modified the milestones: 0.3.0, 0.4.0 May 11, 2017
This was referenced May 10, 2018
@bmcfee bmcfee added the schema Issues pertaining to schema definitions label Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement schema Issues pertaining to schema definitions
Projects
None yet
Development

No branches or pull requests

3 participants