Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About MCRVQ #8

Open
jishengpeng opened this issue Aug 16, 2024 · 0 comments
Open

About MCRVQ #8

jishengpeng opened this issue Aug 16, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed

Comments

@jishengpeng
Copy link
Owner

We attempted to construct a discrete codec model that is more suitable for downstream speech language models.

Our objective is to include less information in the first channel of the codebook while increasing the missing information on limited channels. We consider that within downstream speech language models, the first-layer quantizer of the Codec model serves as an intermediary module bridging textual input and subsequent quantizers.

By judiciously reducing information within the first-layer quantizer, employing text (which inherently carries less information compared to speech) to generate first Codec(codec in the first quantizer) with lower information content can be more easy.

Therefore, we devised the Masked Channel Residual Vector Quantization (MCRVQ) mechanism, which employs the masking mechanism to restrict the quantizers of the first three channels to learn only the compressed audio frame information in the specified space.

@jishengpeng jishengpeng pinned this issue Aug 16, 2024
@jishengpeng jishengpeng added documentation Improvements or additions to documentation help wanted Extra attention is needed labels Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant