v0.3

FDecaYed released this 13 Feb 06:13

· 22 commits to main since this release

34cc5d7

What’s Changed

New Features

CUDA 12 support
Automatic concatenation of multiple embedding tables for greatly improved speed
Support model parallel with user-defined custom keras layer through DistributedEmbedding wrapper

Improvements

Support cases where number of workers is greater than number of tables.
For corner cases where diffrerent slices of a table are placed onto same worker, they will be merged into single slice now.

Breaking Changes

move submodule from CUB to NVIDIA Thrust for better compatibilities

Bug Fixes

Better error handling in set_weight() when weights are not initialized
Better error handling when global batchsize is not divisible by number of workers

Full Changelog: v0.2...v0.3

Assets 2