Skip to content

Releases: daulet/tokenizers

v1.20.2

07 Nov 21:15
Compare
Choose a tag to compare

What's Changed

  • feat: better error message when tokenizers lib mismatch by @daulet in #28
  • feat: FromPretrained to load tokenizer directly from HF by @berkayersoyy in #27

New Contributors

Full Changelog: v0.9.0...v1.20.2

v0.9.0

09 Aug 23:20
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.8.0...v0.9.0

v0.8.0

12 Jun 01:43
d503b5b
Compare
Choose a tag to compare

Breaking change:

Path to compiled rust library needs to be specified via -ldflags. I found it most convenient to use CGO_LDFLAGS env variable to avoid always setting it. See #18 for more details.

What's Changed

  • Update to allow for platform dependent libs in CGO by @jmoney in #18

New Contributors

Full Changelog: v0.7.1...v0.8.0

v0.7.1

10 Apr 23:30
Compare
Choose a tag to compare
  • Update core tokenizers library to latest: v0.15.2;
  • Expose init time parameter to encode special tokens (or not);

Full Changelog: v0.7.0...v0.7.1

v0.7.0

07 Jan 00:38
Compare
Choose a tag to compare

What's Changed

  • support more attributes from the Encoding structure by @clems4ever in #5

Full Changelog: v0.6.1...v0.7.0

v0.6.1

09 Nov 23:26
Compare
Choose a tag to compare
  • Simply changing bazel target names

v0.6.0

09 Nov 02:01
5e367fe
Compare
Choose a tag to compare
  • Update underlying core library to v0.14.1 (latest at the moment);
  • Support bazel build system so downstream projects can easily consume this;
  • Artifacts are smaller too since we lost dependency on openssl;

v0.5.1

22 Sep 16:19
315fa52
Compare
Choose a tag to compare
  • fix tokenizer memory leak
  • fix panic in encode/decode with invalid utf8 string

v0.5.0

07 Jul 05:09
Compare
Choose a tag to compare
  • Encode now returns token string representations;
  • Proper free of Rust strings in Decode;

v0.4.3

08 May 19:33
Compare
Choose a tag to compare
  • Release artifact for darwin-x86_64