Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config/Model Checkpoint Pairing #47

Open
MorenoLaQuatra opened this issue Oct 24, 2024 · 5 comments
Open

Config/Model Checkpoint Pairing #47

MorenoLaQuatra opened this issue Oct 24, 2024 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@MorenoLaQuatra
Copy link

Hi!

First of all, thank you for your excellent work! I am currently trying to use pre-trained WavTokenizer models and I wanted to confirm the correct pairing between the available configuration files and the model checkpoints before proceeding with my experiments.

Based on the README and available configuration files, I have made the following assumptions for pairing each config with its corresponding model checkpoint. Could you please confirm if this is correct?

  1. wavtokenizer_smalldata_frame40_3s_nq1_code4096_dim512_kmeans200_attn.yaml

    • Should refer to models using 40 tokens per second.
  2. wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml

    • Should refer to models using 75 tokens per second.

Inferred Pairing:

  • wavtokenizer_smalldata_frame40_3s_nq1_code4096_dim512_kmeans200_attn.yaml should be paired with the following models:

    • WavTokenizer-small-600-24k-4096
    • WavTokenizer-large-600-24k-4096
  • wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml should be paired with the following models:

    • WavTokenizer-small-320-24k-4096
    • WavTokenizer-medium-320-24k-4096
    • WavTokenizer-large-320-24k-4096

Could you please confirm if this pairing is correct or if any adjustments are needed?

Thank you for your help!

@jishengpeng
Copy link
Owner

Hi!

First of all, thank you for your excellent work! I am currently trying to use pre-trained WavTokenizer models and I wanted to confirm the correct pairing between the available configuration files and the model checkpoints before proceeding with my experiments.

Based on the README and available configuration files, I have made the following assumptions for pairing each config with its corresponding model checkpoint. Could you please confirm if this is correct?

  1. wavtokenizer_smalldata_frame40_3s_nq1_code4096_dim512_kmeans200_attn.yaml

    • Should refer to models using 40 tokens per second.
  2. wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml

    • Should refer to models using 75 tokens per second.

Inferred Pairing:

  • wavtokenizer_smalldata_frame40_3s_nq1_code4096_dim512_kmeans200_attn.yaml should be paired with the following models:

    • WavTokenizer-small-600-24k-4096
    • WavTokenizer-large-600-24k-4096
  • wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml should be paired with the following models:

    • WavTokenizer-small-320-24k-4096
    • WavTokenizer-medium-320-24k-4096
    • WavTokenizer-large-320-24k-4096

Could you please confirm if this pairing is correct or if any adjustments are needed?

Thank you for your help!

Yes, you are right, if you want to infer only, no adjustments needed.

@jishengpeng jishengpeng added the documentation Improvements or additions to documentation label Oct 24, 2024
@MorenoLaQuatra
Copy link
Author

Thanks! If I can ask, for the large model trained on speech/audio/music the provided checkpoint is WavTokenizer-large-speech-75token. It seems only speech, am I wrong?

@jishengpeng
Copy link
Owner

Thanks! If I can ask, for the large model trained on speech/audio/music the provided checkpoint is WavTokenizer-large-speech-75token. It seems only speech, am I wrong?

WavTokenizer-Large-unify-40token support speech, audio, music. WavTokenizer-large-speech-75token support speech only up to now.

@MorenoLaQuatra
Copy link
Author

Thank you. The link here show an empty repo, do you plan to release the model?

@agonzalezd
Copy link

Greetings. I will comment here since this issue is related with the problem I am facing.

Method WavTokenizer.from_pretrained throws errors when trying to use the models in HuggingFace since they are looking for a config.yaml file and it does not exist in the repositories. Also, it will look for a pytorch_model.bin which does not exist either.

When trying to use these config files by calling wavtokenizer = WavTokenizer.from_hparams('configs/wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml'), I am having another exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[...]/WavTokenizer/decoder/pretrained.py", line 55, in from_hparams
    feature_extractor = instantiate_class(args=(), init=config["feature_extractor"])
                                                        ~~~~~~^^^^^^^^^^^^^^^^^^^^^
KeyError: 'feature_extractor'

So these config files are not suitable for loading the pretrained models for inference either or I am doing something wrong.

I believe the provided examples in the main README.md are for inference using models trained by users, and not the pretrained models. Could you provide a working example of using the pretrained models uploaded to HuggingFace for inference?

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants