v0.3
What’s Changed
New Features
- CUDA 12 support
- Automatic concatenation of multiple embedding tables for greatly improved speed
- Support model parallel with user-defined custom keras layer through
DistributedEmbedding
wrapper
Improvements
- Support cases where number of workers is greater than number of tables.
- For corner cases where diffrerent slices of a table are placed onto same worker, they will be merged into single slice now.
Breaking Changes
- move submodule from CUB to NVIDIA Thrust for better compatibilities
Bug Fixes
- Better error handling in
set_weight()
when weights are not initialized - Better error handling when global batchsize is not divisible by number of workers
Full Changelog: v0.2...v0.3