[QST] Categorifying nested lists in NVTabular and transformers4rec #792

maciekrtb · 2024-10-28T23:32:26Z

❓ Questions & Help

Details

Hello everyone! In my sequential recommendation dataset every item actually comes annotated with a list of categories (potentially with repeated values). The following would be a pretty meaningful example.

data = [
    {"session_id": 1, "item_id-list": [101, 102, 103], "categories-list": [[A, B], [C, D], [E]]},
    {"session_id": 2, "item_id-list": [201, 202], "categories-list": [[A], [F, F]]}
]

Is it possible to categorify the categories present above in a nested way so that:

the lists [[A,B], [C,D], ..], .. do not become separate tokens but remain lists of categorified elements (e.g. [[1,2], [3,4], [6]] and [[1], [5,5]])
we can then feed those into EmbeddingBag downstream?

I've tried supplying the Dataset constructor with an appropriate schema, but unfortunately failed. I could also try flattening the lists categorifying and fusing back but this looks like a inefficient and bad idea..

The text was updated successfully, but these errors were encountered:

maciekrtb added the status/needs-triage label Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Categorifying nested lists in NVTabular and transformers4rec #792

[QST] Categorifying nested lists in NVTabular and transformers4rec #792

maciekrtb commented Oct 28, 2024

[QST] Categorifying nested lists in NVTabular and transformers4rec #792

[QST] Categorifying nested lists in NVTabular and transformers4rec #792

Comments

maciekrtb commented Oct 28, 2024

❓ Questions & Help

Details