Skip to content

Commit

Permalink
Fix bugs in criteo.py that leads to NaN problem
Browse files Browse the repository at this point in the history
The original script simply added 3 to the target value before taking the log. This led to the issue that in data preprocessing, if there was a value of -3, it would result in a value of -inf. This problem was mentioned in the issue facebookresearch/dlrm#363 (comment). I changed the preprocessing operation to dense_np -= dense_np.min() - 2 in the tsv_to_npys function, and correctly handled the Criteo Kaggle dataset.
  • Loading branch information
TomekWei authored Jun 21, 2024
1 parent 842e087 commit 051ade8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion torchrec/datasets/criteo.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ def row_mapper(row: List[str]) -> Tuple[List[int], List[int], int]:
del labels

# Log is expensive to compute at runtime.
dense_np -= (dense_np.min() - 2)
dense_np -= dense_np.min() - 2
dense_np = np.log(dense_np, dtype=np.float32)

# To be consistent with dense and sparse.
Expand Down

0 comments on commit 051ade8

Please sign in to comment.