Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Can i test with custom data? #51

Open
589hero opened this issue May 12, 2022 · 6 comments
Open

Can i test with custom data? #51

589hero opened this issue May 12, 2022 · 6 comments

Comments

@589hero
Copy link

589hero commented May 12, 2022

Hi, i am interested in this exciting project and i am trying to test this with our custom dataset and reproduce the format of original data. But there are some difficulties and questions below.

  1. Is there no way to use custom datasets at all?
  1. Is there any code to calculate elements of dataset below?
  • I want to know how to get "note_active_velocities", "note_active_frame_indices", "power_db", "note_onsets", "note_offsets" but there is no any code on repository.

Thank you for reading!

@lukewys
Copy link
Contributor

lukewys commented May 12, 2022

Huge thanks for your interest in our work!

First, sorry that training on custom datasets is still hacky. For your question:

  1. Yes, you can refer to Technical limitations in processing arbitrary datasets? #46 for how to hack to train on your own dataset
  2. The code for the URMP dataloader is in https://github.com/magenta/ddsp/blob/main/ddsp/training/data.py#L495. "note_active_velocities", "note_active_frame_indices", "power_db" are not used in training MIDI-DDSP (you can search up the codebase and we do not use that key). So you can create a new dataloader class and exclude those keys.
  3. "note_onsets", "note_offsets" are the index of the frame where the note takes onset and offset. In the original tfrecord, it is binary and in shape [T_frame, 128], indicating which frame of which note is on/off. But in MIDI-DDSP, since we deal with monophonic playing, I convert it into a binary tensor in shape [T_frame], indicating which frame of the note is on/off.
    You can look into https://github.com/magenta/ddsp/blob/main/ddsp/training/data.py#L495 for more details. The final useful tensor is the one out of _reshape_tensors() and the keys used in MIDI-DDSP.

Hope this might help.

I probably will work on updating this codebase to support arbitrary dataset, but I don't know when exactly I will have that time.

@589hero
Copy link
Author

589hero commented May 16, 2022

Thank you for reply in detail!

@adagio715
Copy link

Huge thanks for your interest in our work!

First, sorry that training on custom datasets is still hacky. For your question:

  1. Yes, you can refer to Technical limitations in processing arbitrary datasets? #46 for how to hack to train on your own dataset
  2. The code for the URMP dataloader is in https://github.com/magenta/ddsp/blob/main/ddsp/training/data.py#L495. "note_active_velocities", "note_active_frame_indices", "power_db" are not used in training MIDI-DDSP (you can search up the codebase and we do not use that key). So you can create a new dataloader class and exclude those keys.
  3. "note_onsets", "note_offsets" are the index of the frame where the note takes onset and offset. In the original tfrecord, it is binary and in shape [T_frame, 128], indicating which frame of which note is on/off. But in MIDI-DDSP, since we deal with monophonic playing, I convert it into a binary tensor in shape [T_frame], indicating which frame of the note is on/off.
    You can look into https://github.com/magenta/ddsp/blob/main/ddsp/training/data.py#L495 for more details. The final useful tensor is the one out of _reshape_tensors() and the keys used in MIDI-DDSP.

Hope this might help.

I probably will work on updating this codebase to support arbitrary dataset, but I don't know when exactly I will have that time.

Hi! I think "note_active_frame_indices" is actually used in model training because this feature is used to calculate data['midi'] (https://github.com/magenta/ddsp/blob/7cb3c37f96a3e5b4a2b7e94fdcc801bfd556021b/ddsp/training/data.py#L540) which is a necessary feature when training the synthesis generator. I have tried training a synthesis generator without "note_active_frame_indices" (more specifically, I assigned random numbers to data['midi'] when creating the dataset), and the resultant model doesn't work.
Could you double check if "note_active_frame_indices" is needed? I was also wondering what this feature means, could you explain it? Thanks!

@lukewys
Copy link
Contributor

lukewys commented Dec 17, 2022

Hi, note_active_frame_indices is a binary tensor containing the onset information. The tensor is at shape [num_frame, 128], and if note_active_frame_indices[i,j] is 1, it means at i-th frame, the pitch j is on. By applying argmax to the -1 dimension, the note_active_frame_indices becomes data['midi'].

data['midi'] is a 1D integer tensor of shape [num_frames] where each item is the MIDI pitch number (integer) at that frame. Thus, MIDI-DDSP relies on data['midi'] for the MIDI input and to get the note boundary. It is crucial to train the model.

Hope this helps.

Best

@adagio715
Copy link

Hi, note_active_frame_indices is a binary tensor containing the onset information. The tensor is at shape [num_frame, 128], and if note_active_frame_indices[i,j] is 1, it means at i-th frame, the pitch j is on. By applying argmax to the -1 dimension, the note_active_frame_indices becomes data['midi'].

data['midi'] is a 1D integer tensor of shape [num_frames] where each item is the MIDI pitch number (integer) at that frame. Thus, MIDI-DDSP relies on data['midi'] for the MIDI input and to get the note boundary. It is crucial to train the model.

Hope this helps.

Best

Hi! Thanks a lot for your reply. This makes sense then. So the difference between note_active_frame_indices and note_onsets is that note_active_frame_indices indicates that the note "is being played" while note_onsets indicates that the note "starts". Did I get this correctly?

Just a little follow-up question about the content of note_active_frame_indices: From the provided urmp tfrecords, I checked the tensor values of this feature before it was reshaped into [num_frame]. I found that for each frame, the 128-d array contains 127 zeros and an integer value (something like 86, 87, 427, 428..), instead of 127 zeros and a "1". I don't think this affects the feature data['midi'], but I was wondering why those integer values instead of a simple "1" :)) Maybe you could explain if you know why?

Thanks!

@lukewys
Copy link
Contributor

lukewys commented Dec 18, 2022 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants