Skip to content

Latest commit

 

History

History
56 lines (48 loc) · 1.2 KB

README.md

File metadata and controls

56 lines (48 loc) · 1.2 KB

ftokenizer

Flutter Tokenizer for NLP models

Usage

ensure to add init

   await FTokenizer.init();

and to dispose

    FTokenizer.dispose();

If using on with Isolate, make shure to call await FTokenizer.init();on the begin andFTokenizer.dispose(); before close the Isolate

FTokenizer uses rust_tokenizer See the rust_tokenizer description: Rust-tokenizer is a drop-in replacement for the tokenization methods from the Transformers library It includes a broad range of tokenizers for state-of-the-art transformers architectures, including: Sentence Piece (unigram model)

Sentence Piece (BPE model)

BERT

ALBERT

DistilBERT

RoBERTa

GPT

GPT2

ProphetNet

CTRL

Pegasus

MBart50

M2M100

NLLB

DeBERTa

DeBERTa (v2)

The wordpiece based tokenizers include both single-threaded and multi-threaded processing. The Byte-Pair-Encoding tokenizers favor the use of a shared cache and are only available as single-threaded tokenizers Using the tokenizers requires downloading manually the tokenizers required files (vocabulary or merge files). These can be found in the Transformers library.