Floating point exception (core dumped) when use quantize embeddings with float32 dtype #2807

tiankongdeguiji · 2025-03-12T06:35:37Z

Using quantized embeddings with the float32 data type may lead to Floating point exception (core dumped)，We can reproduce this using the following command: python test_quant.py，and use the enviroment torchrec==1.1.0+cu124, torch==2.6.0+cu124, fbgemm-gpu==1.1.0+cu124

test_quant.py

import torch
import torchrec
from torch import nn
from torchrec import EmbeddingBagCollection
from torchrec.optim.optimizers import in_backward_optimizer_filter
from torchrec.sparse.jagged_tensor import KeyedJaggedTensor
from torchrec.inference.modules import quantize_embeddings

large_table_cnt = 2
small_table_cnt = 2
large_tables = [
    torchrec.EmbeddingBagConfig(
        name="large_table_" + str(i),
        embedding_dim=64,
        num_embeddings=4096,
        feature_names=["large_table_feature_" + str(i)],
        pooling=torchrec.PoolingType.SUM,
    )
    for i in range(large_table_cnt)
]
small_tables = [
    torchrec.EmbeddingBagConfig(
        name="small_table_" + str(i),
        embedding_dim=64,
        num_embeddings=1024,
        feature_names=["small_table_feature_" + str(i)],
        pooling=torchrec.PoolingType.SUM,
    )
    for i in range(small_table_cnt)
]

class DebugModel(nn.Module):
    def __init__(self, device: torch.device):
        super().__init__()
        self.ebc = EmbeddingBagCollection(tables=large_tables + small_tables, device=device)
        self.linear = nn.Linear(64 * (small_table_cnt + large_table_cnt), 1)

    def forward(self, kjt: KeyedJaggedTensor):
        emb = self.ebc(kjt)
        return torch.mean(self.linear(emb.values()))
    
model = DebugModel(device=torch.device("cuda:0"))
# dtype == qint8 is ok  
quantize_embeddings(model, dtype=torch.float, inplace=True)

The text was updated successfully, but these errors were encountered:

tiankongdeguiji · 2025-03-12T06:36:57Z

Hi, @henrylhtsang @kausv @joshuadeng @PaulZhang12 can you see this problem?

tiankongdeguiji · 2025-03-12T06:38:25Z

dtype=torch.float should not call FloatOrHalfToFusedNBitRowwiseQuantizedSBHalf, we support it in this pr #2794

iamzainhuda · 2025-03-17T04:38:21Z

taking a look!

tiankongdeguiji mentioned this issue Mar 14, 2025

fix row-wise alltoall error when some embeddings use mean pooling and others use sum pooling #2809

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating point exception (core dumped) when use quantize embeddings with float32 dtype #2807

Floating point exception (core dumped) when use quantize embeddings with float32 dtype #2807

tiankongdeguiji commented Mar 12, 2025 •

edited

Loading

tiankongdeguiji commented Mar 12, 2025

tiankongdeguiji commented Mar 12, 2025

iamzainhuda commented Mar 17, 2025

Floating point exception (core dumped) when use quantize embeddings with float32 dtype #2807

Floating point exception (core dumped) when use quantize embeddings with float32 dtype #2807

Comments

tiankongdeguiji commented Mar 12, 2025 • edited Loading

tiankongdeguiji commented Mar 12, 2025

tiankongdeguiji commented Mar 12, 2025

iamzainhuda commented Mar 17, 2025

tiankongdeguiji commented Mar 12, 2025 •

edited

Loading