Skip to content

v0.3

Compare
Choose a tag to compare
@FDecaYed FDecaYed released this 13 Feb 06:13
· 22 commits to main since this release

What’s Changed

New Features

  • CUDA 12 support
  • Automatic concatenation of multiple embedding tables for greatly improved speed
  • Support model parallel with user-defined custom keras layer through DistributedEmbedding wrapper

Improvements

  • Support cases where number of workers is greater than number of tables.
  • For corner cases where diffrerent slices of a table are placed onto same worker, they will be merged into single slice now.

Breaking Changes

  • move submodule from CUB to NVIDIA Thrust for better compatibilities

Bug Fixes

  • Better error handling in set_weight() when weights are not initialized
  • Better error handling when global batchsize is not divisible by number of workers

Full Changelog: v0.2...v0.3