You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for the great work and it's great to see new approaches to improve training efficiency. I'd like to request the extension of the transformers patch support to the qwen2 series of models, as the recent qwen2.5 series on the same architecture has top-notch performance and could benefit from this technique given the vocabulary size of 151936.
I can help with a PR if needed, but AFAIK the code is quite similar to the already supported llama, with some minor changes.
The text was updated successfully, but these errors were encountered:
Hi, thanks for the great work and it's great to see new approaches to improve training efficiency. I'd like to request the extension of the
transformers
patch support to the qwen2 series of models, as the recent qwen2.5 series on the same architecture has top-notch performance and could benefit from this technique given the vocabulary size of 151936.I can help with a PR if needed, but AFAIK the code is quite similar to the already supported llama, with some minor changes.
The text was updated successfully, but these errors were encountered: