Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: RWC POP melody parser #12

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

WIP: RWC POP melody parser #12

wants to merge 6 commits into from

Conversation

justinsalamon
Copy link
Contributor

@justinsalamon
Copy link
Contributor Author

Hey everyone, do you think we could have a folder inside parsers called "data", which contains a folder per parser (when needed) with any data files the parser requires? In my case the RWC Pop melody metadata lives online in a table, which I've converted to a csv file, so I'd like to keep this csv file in the repo.

@justinsalamon
Copy link
Contributor Author

On a related note, this table also includes the track duration (resolution in seconds only though). @bmcfee do you think it'd make sense to populate the duration field of the JAMS file based on this table?

I could do it from the audio, but RWC is distributed via CD's (at least it was when I got a copy), which means the exact duration of the audio I have might depend on whatever software I used to rip the CD's in the first place, and I have no way of recovering that information. I also don't have the original CD's at hand. So I was thinking the durations reported online might be my best in this case?

@bmcfee
Copy link
Contributor

bmcfee commented Feb 22, 2016

So I was thinking the durations reported online might be my best in this case?

Yeah, in this case, I think that's the best you can hope for.

RWC is distributed on cds, but I thought they were data (not audio) cds? I may be mistaken though.

@justinsalamon
Copy link
Contributor Author

RWC is distributed on cds, but I thought they were data (not audio) cds? I may be mistaken though.

I don't remember TBH, you may very well be right. Still, I don't remember by which process I obtained the WAV files that I have... so I'll stick to the times reported online for now.

Data copied directly from the table that can be found online at:
https://staff.aist.go.jp/m.goto/RWC-MDB/rwc-mdb-p.html
Implemented process_folder, create_jams, fill_annotation_metadata and
fill_file_metadata
@justinsalamon
Copy link
Contributor Author

I think this is good to go, anyone care to CR it before I merge the PR?

@justinsalamon
Copy link
Contributor Author

p.s. - I still need to get permission to include the converted jams files in the repo, working on that.

jam.file_metadata.title = metadata['Title'][n_track]

d_str = metadata['Length'][n_track]
jam.file_metadata.duration = (float(d_str.split(":")[0]) * 60 +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty dirty. Since we're already using pandas here, we can do something like this instead:

In [4]: pd.Timedelta('0:2:34').total_seconds()
Out[4]: 154.0

(only problem is that you'll have to pad in hours, but i generally prefer that to reinventing second-parsing.)

@bmcfee
Copy link
Contributor

bmcfee commented Feb 25, 2016

@justinsalamon C has been R'ed. Mostly minor style issues, though the tempfile cleanup is a real problem. Otherwise LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants