Config/Model Checkpoint Pairing #47

MorenoLaQuatra · 2024-10-24T15:33:49Z

Hi!

First of all, thank you for your excellent work! I am currently trying to use pre-trained WavTokenizer models and I wanted to confirm the correct pairing between the available configuration files and the model checkpoints before proceeding with my experiments.

Based on the README and available configuration files, I have made the following assumptions for pairing each config with its corresponding model checkpoint. Could you please confirm if this is correct?

wavtokenizer_smalldata_frame40_3s_nq1_code4096_dim512_kmeans200_attn.yaml
- Should refer to models using 40 tokens per second.
wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml
- Should refer to models using 75 tokens per second.

Inferred Pairing:

wavtokenizer_smalldata_frame40_3s_nq1_code4096_dim512_kmeans200_attn.yaml should be paired with the following models:
- WavTokenizer-small-600-24k-4096
- WavTokenizer-large-600-24k-4096
wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml should be paired with the following models:
- WavTokenizer-small-320-24k-4096
- WavTokenizer-medium-320-24k-4096
- WavTokenizer-large-320-24k-4096

Could you please confirm if this pairing is correct or if any adjustments are needed?

Thank you for your help!

The text was updated successfully, but these errors were encountered:

jishengpeng · 2024-10-24T16:02:26Z

Hi!

First of all, thank you for your excellent work! I am currently trying to use pre-trained WavTokenizer models and I wanted to confirm the correct pairing between the available configuration files and the model checkpoints before proceeding with my experiments.

Based on the README and available configuration files, I have made the following assumptions for pairing each config with its corresponding model checkpoint. Could you please confirm if this is correct?

wavtokenizer_smalldata_frame40_3s_nq1_code4096_dim512_kmeans200_attn.yaml

Should refer to models using 40 tokens per second.

wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml

Should refer to models using 75 tokens per second.

Inferred Pairing:

wavtokenizer_smalldata_frame40_3s_nq1_code4096_dim512_kmeans200_attn.yaml should be paired with the following models:

WavTokenizer-small-600-24k-4096

WavTokenizer-large-600-24k-4096

wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml should be paired with the following models:

WavTokenizer-small-320-24k-4096

WavTokenizer-medium-320-24k-4096

WavTokenizer-large-320-24k-4096

Could you please confirm if this pairing is correct or if any adjustments are needed?

Thank you for your help!

Yes, you are right, if you want to infer only, no adjustments needed.

MorenoLaQuatra · 2024-10-25T06:30:38Z

Thanks! If I can ask, for the large model trained on speech/audio/music the provided checkpoint is WavTokenizer-large-speech-75token. It seems only speech, am I wrong?

jishengpeng · 2024-10-25T10:57:51Z

Thanks! If I can ask, for the large model trained on speech/audio/music the provided checkpoint is WavTokenizer-large-speech-75token. It seems only speech, am I wrong?

WavTokenizer-Large-unify-40token support speech, audio, music. WavTokenizer-large-speech-75token support speech only up to now.

MorenoLaQuatra · 2024-10-25T12:51:56Z

Thank you. The link here show an empty repo, do you plan to release the model?

agonzalezd · 2024-10-28T15:02:01Z

Greetings. I will comment here since this issue is related with the problem I am facing.

Method WavTokenizer.from_pretrained throws errors when trying to use the models in HuggingFace since they are looking for a config.yaml file and it does not exist in the repositories. Also, it will look for a pytorch_model.bin which does not exist either.

When trying to use these config files by calling wavtokenizer = WavTokenizer.from_hparams('configs/wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml'), I am having another exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[...]/WavTokenizer/decoder/pretrained.py", line 55, in from_hparams
    feature_extractor = instantiate_class(args=(), init=config["feature_extractor"])
                                                        ~~~~~~^^^^^^^^^^^^^^^^^^^^^
KeyError: 'feature_extractor'

So these config files are not suitable for loading the pretrained models for inference either or I am doing something wrong.

I believe the provided examples in the main README.md are for inference using models trained by users, and not the pretrained models. Could you provide a working example of using the pretrained models uploaded to HuggingFace for inference?

Thanks in advance

jishengpeng added the documentation Improvements or additions to documentation label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config/Model Checkpoint Pairing #47

Config/Model Checkpoint Pairing #47

MorenoLaQuatra commented Oct 24, 2024

jishengpeng commented Oct 24, 2024

Inferred Pairing:

MorenoLaQuatra commented Oct 25, 2024

jishengpeng commented Oct 25, 2024

MorenoLaQuatra commented Oct 25, 2024

agonzalezd commented Oct 28, 2024

Config/Model Checkpoint Pairing #47

Config/Model Checkpoint Pairing #47

Comments

MorenoLaQuatra commented Oct 24, 2024

Inferred Pairing:

jishengpeng commented Oct 24, 2024

Inferred Pairing:

MorenoLaQuatra commented Oct 25, 2024

jishengpeng commented Oct 25, 2024

MorenoLaQuatra commented Oct 25, 2024

agonzalezd commented Oct 28, 2024