Skip to content
This repository has been archived by the owner on Sep 6, 2022. It is now read-only.

Preprocessing issue when using token_type='character' #44

Open
ayushgarg70 opened this issue Sep 25, 2020 · 0 comments
Open

Preprocessing issue when using token_type='character' #44

ayushgarg70 opened this issue Sep 25, 2020 · 0 comments

Comments

@ayushgarg70
Copy link

I guess the function tf_vocab_encode(text,vocab_table) in encoding.py should be implemented as:

def tf_voc_encode(vocab_table):

    def tf_vocab_encode(text):

        tokens = tf.strings.bytes_split(text)

        return vocab_table.lookup(tokens)

    return lambda text: tf.py_function(tf_vocab_encode, inp=[text], Tout=tf.int32)

and then the encoder_fn should be set as:

encoder_fn = tf_voc_encode(vocab_table) in the function get_encoder as this is later called as an argument of _dataset.map() which always runs in Graph mode and hence I was always getting the error:

ValueError("input must have a statically-known rank.")

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant