Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Fix init_bucket failure caused by max_hbm being smaller than slice size #188

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

LinGeLin
Copy link
Collaborator

@LinGeLin LinGeLin commented May 7, 2024

Fix init_bucket failure caused by max_hbm being smaller than slice size

error msg:

tensorflow.python.framework.errors_impl.InternalError: external/hkv/include/merlin/core_kernels.cuh:206: CUDA error cudaErrorInvalidValue : invalid argument [Op:TFRA>HkvHashTableOfTensors] name: basic0_mht_1of1

recurrence:

key_dtype: int64_t
value_dtype: float
dim: 12
init_capacity: 1024 * 1024 * 400
max_capacity: 1024 * 1024 * 600
max_hbm: 1024 * 1024 * 400

Copy link

github-actions bot commented May 7, 2024

@LinGeLin LinGeLin force-pushed the fix_init_bucket branch from b1609d2 to 64ef833 Compare May 7, 2024 12:27
@rhdong
Copy link
Member

rhdong commented May 7, 2024

/blossom-ci

1 similar comment
@rhdong
Copy link
Member

rhdong commented May 14, 2024

/blossom-ci

Copy link
Member

@rhdong rhdong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhdong rhdong force-pushed the fix_init_bucket branch from 64ef833 to 86be4e5 Compare June 13, 2024 02:50
@rhdong
Copy link
Member

rhdong commented Jun 13, 2024

/blossom-ci

@rhdong rhdong merged commit c895d54 into NVIDIA-Merlin:master Jun 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants