This is a large release including many new features:

New models

Implemented support for arbitrary scikit-learn models via SKLearnClassifier and TF-IDF as a baseline embedding approach via TfidfEmbedder. Implemented support for spaCy text categorizer models and spacy-transformers models via SpaCyModel. Upgraded pytorch_transformers v1.0.0 to transformers v2.4.1, which added support for several new models.

Interactive apps

gobbli now comes bundled with a few Streamlit apps that can be used to explore datasets, evaluate gobbli model performance, and generate local explanations for gobbli model predictions. See the docs for more information.

Overhauled benchmarks

Completely overhauled the benchmark framework. Benchmark output is now stored as Markdown files, which can much more easily be read on GitHub, and benchmarks can be selectively rerun when new models are added. Also developed a "benchmark" for embeddings, which plots the model embeddings in 2 dimensions and allows for a qualitative assessment of how well each model differentiates between the classes in the dataset. See the benchmark output folder.

Miscellaneous improvements

Add new BERT weights from NCBI trained on PubMed data (ncbi-bert-base-pubmed-uncased, ncbi-bert-base-pubmed-mimic-uncased, ncbi-bert-large-pubmed-uncased, ncbi-bert-large-pubmed-mimic-uncased) (thanks @pmbaumgartner!)
Upgrade fastText to a more recent version which supports autotuning parameters.
Add support for optional gradient accumulation in Transformer models, allowing for smaller batch sizes and larger models while retaining performance
Upgrade USE implementation to the TensorFlow 2.0 version and add support for multilingual weights (universal-sentence-encoder-multilingual, universal-sentence-encoder-multilingual-large)
Add a couple of utilities for inspecting and cleaning up disk usage
Fix memory issues with USE model by batching input data
Fix potential encoding issues with non-ASCII text in USE model
Reuse static pretrained weights across instances of models instead of redownloading every time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0 - New models, interactive apps, overhauled benchmarks

New models

Interactive apps

Overhauled benchmarks

Miscellaneous improvements