-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 7th No.43】完善 TokenizerFast 功能支持 part 1 #9407
Conversation
Thanks for your contribution! |
|
451d07d
to
6d95920
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9407 +/- ##
===========================================
- Coverage 53.08% 52.96% -0.12%
===========================================
Files 687 689 +2
Lines 109472 109412 -60
===========================================
- Hits 58114 57952 -162
- Misses 51358 51460 +102 ☔ View full report in Codecov by Sentry. |
5f1403e
to
7161b60
Compare
cc:@DrownFish19 麻烦再帮我看下pr吧,感谢 |
@@ -0,0 +1,131 @@ | |||
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
辛苦在这里增加一下HuggingFace的Copyright
("bloom", "BloomTokenizer"), | ||
( | ||
"bloom", | ||
("BloomTokenizer", "BloomTokenizerFast" if is_tokenizers_available() else None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这里换行一下,格式统一
"LlamaTokenizer": LlamaConverter, | ||
"BertTokenizer": BertConverter, | ||
} | ||
SLOW_TO_FAST_CONVERTERS = {"LlamaTokenizer": LlamaConverter, "BertTokenizer": BertConverter} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的convert是可以通用吗?后续可以验证一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该没有新加bloom的,因为我看在hf上bloom只有fast,没有convert的流程
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Models
Description
为Bloom提供tokenizer fast支持,顺便想问一下。是对test里每个def我都要添加一个fast的测试吗~感谢🙏