Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

ensure str for the case of bytes type. #223

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jnory
Copy link

@jnory jnory commented Jul 17, 2020

Hi,

I noticed that create_pretraining_data.py aborts by the error:

  File "albert/create_pretraining_data.py", line 405, in <listcomp>
    for i in piece])):
AttributeError: 'int' object has no attribute 'lower'

The reason why the error occurs is that the variable piece may be a bytes type in Python 3.

I'm using sentencepiece tokenizer, and, the minimal case of the input text is following (the text comes from wikipedia):

カムデンは(39.937195, -75.106186)に位置する。

This small PR fixes the problem by ensuring str type for the piece.
Please let me know if you notice anything.

Sincerely,

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no Contributor License Agreement (no) label Jul 17, 2020
@jnory
Copy link
Author

jnory commented Jul 17, 2020

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes Contributor License Agreement (yes) and removed cla: no Contributor License Agreement (no) labels Jul 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Contributor License Agreement (yes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants