Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Categorifying nested lists in NVTabular and transformers4rec #792

Open
maciekrtb opened this issue Oct 28, 2024 · 0 comments
Open

Comments

@maciekrtb
Copy link

❓ Questions & Help

Details

Hello everyone! In my sequential recommendation dataset every item actually comes annotated with a list of categories (potentially with repeated values). The following would be a pretty meaningful example.

data = [
    {"session_id": 1, "item_id-list": [101, 102, 103], "categories-list": [[A, B], [C, D], [E]]},
    {"session_id": 2, "item_id-list": [201, 202], "categories-list": [[A], [F, F]]}
]

Is it possible to categorify the categories present above in a nested way so that:

  • the lists [[A,B], [C,D], ..], .. do not become separate tokens but remain lists of categorified elements (e.g. [[1,2], [3,4], [6]] and [[1], [5,5]])
  • we can then feed those into EmbeddingBag downstream?

I've tried supplying the Dataset constructor with an appropriate schema, but unfortunately failed. I could also try flattening the lists categorifying and fusing back but this looks like a inefficient and bad idea..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant