Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insight needed: Residual of code embedding in the high dimensional is not decreasing. #89

Open
jhauret opened this issue Aug 8, 2024 · 0 comments

Comments

@jhauret
Copy link

jhauret commented Aug 8, 2024

Dear Authors,

Thank you for publishing your work and making your code available online. It is of great value to the audio community.

I was curious about how using more or less quantizers affects the distance between the continuous and quantized embeddings in the high-dimensional embedding space. So I produced this code:

import dac
import torch
import torchaudio


model_path = dac.utils.download(model_type="44khz", model_bitrate="8kbps")
model = dac.DAC.load(model_path)


audio, sr = torchaudio.load("./audio_to_i/fileid_1888.flac")

model.eval()
z = model.encoder(audio.unsqueeze(0))
for i in range(9):
    zq, codes, _, _, _ = model.quantizer(z, n_quantizers=i + 1)
    print(f"{i=} , {torch.norm(z-zq).item() = }")

And I was very surprised to see that the norm is increasing with i! Do you have any explanation?

I understand that the distance to code entries is computed in the 8d low-dimensional space, but the 1024d residual should still get smaller the more RVQ scales we use?

Note: I also joined the audio I used in this test and some reconstruction using different number of RVQ scales and it works well. Download link: audio_to_i.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant