Preprocessing issue when using token_type='character' #44

ayushgarg70 · 2020-09-25T00:35:33Z

I guess the function tf_vocab_encode(text,vocab_table) in encoding.py should be implemented as:

def tf_voc_encode(vocab_table):

    def tf_vocab_encode(text):

        tokens = tf.strings.bytes_split(text)

        return vocab_table.lookup(tokens)

    return lambda text: tf.py_function(tf_vocab_encode, inp=[text], Tout=tf.int32)

and then the encoder_fn should be set as:

encoder_fn = tf_voc_encode(vocab_table) in the function get_encoder as this is later called as an argument of _dataset.map() which always runs in Graph mode and hence I was always getting the error:

ValueError("input must have a statically-known rank.")

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing issue when using token_type='character' #44

Preprocessing issue when using token_type='character' #44

ayushgarg70 commented Sep 25, 2020

Preprocessing issue when using token_type='character' #44

Preprocessing issue when using token_type='character' #44

Comments

ayushgarg70 commented Sep 25, 2020