Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizers version 0.12.0 breaks punctfix #6

Open
Rasmusafj opened this issue Apr 1, 2022 · 0 comments
Open

Tokenizers version 0.12.0 breaks punctfix #6

Rasmusafj opened this issue Apr 1, 2022 · 0 comments

Comments

@Rasmusafj
Copy link
Contributor

When using punctfix with tokenizers version 0.12.0, the following error occurs:

Stacktrace

return self.fixer.punctuate(model_input)\n File "/usr/local/lib/python3.8/site-packages/punctfix/inference.py", line 177, in punctuate\n word_prediction_list = self.populate_word_prediction_with_labels(chunks, word_prediction_list)\n File "/usr/local/lib/python3.8/site-packages/punctfix/inference.py", line 111, in populate_word_prediction_with_labels\n output = self.pipe(" ".join(chunk_text))\n File "/usr/local/lib/python3.8/site-packages/transformers/pipelines/token_classification.py", line 189, in call\n return super().call(inputs, **kwargs)\n File "/usr/local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1027, in call\n return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)\n File "/usr/local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1035, in run_single\n outputs = self.postprocess(model_outputs, **postprocess_params)\n File "/usr/local/lib/python3.8/site-packages/transformers/pipelines/token_classification.py", line 242, in postprocess\n grouped_entities = self.aggregate(pre_entities, aggregation_strategy)\n File "/usr/local/lib/python3.8/site-packages/transformers/pipelines/token_classification.py", line 334, in aggregate\n return self.group_entities(entities)\n File "/usr/local/lib/python3.8/site-packages/transformers/pipelines/token_classification.py", line 457, in group_entities\n entity_groups.append(self.group_sub_entities(entity_group_disagg))\n File "/usr/local/lib/python3.8/site-packages/transformers/pipelines/token_classification.py", line 408, in group_sub_entities\n "word": self.tokenizer.convert_tokens_to_string(tokens),\n File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 535, in convert_tokens_to_string\n return self.backend_tokenizer.decoder.decode(tokens)\nTypeError: Can't convert ['du'] to PyString\n", "message": "Can't convert ['du'] to PyString"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant