ftokenizer

Flutter Tokenizer for NLP models

Usage

ensure to add init

   await FTokenizer.init();

and to dispose

    FTokenizer.dispose();

If using on with Isolate, make shure to call await FTokenizer.init();on the begin andFTokenizer.dispose(); before close the Isolate

FTokenizer uses rust_tokenizer See the rust_tokenizer description: Rust-tokenizer is a drop-in replacement for the tokenization methods from the Transformers library It includes a broad range of tokenizers for state-of-the-art transformers architectures, including: Sentence Piece (unigram model)

Sentence Piece (BPE model)

BERT

ALBERT

DistilBERT

RoBERTa

GPT

GPT2

ProphetNet

CTRL

Pegasus

MBart50

M2M100

NLLB

DeBERTa

DeBERTa (v2)

The wordpiece based tokenizers include both single-threaded and multi-threaded processing. The Byte-Pair-Encoding tokenizers favor the use of a shared cache and are only available as single-threaded tokenizers Using the tokenizers requires downloading manually the tokenizers required files (vocabulary or merge files). These can be found in the Transformers library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ftokenizer

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

ftokenizer

Usage