Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

开源的Toys/Sports数据集中的Sequence内部没有按照Timestamp顺序排列 #7

Open
jkdwxm opened this issue Dec 31, 2024 · 2 comments

Comments

@jkdwxm
Copy link

jkdwxm commented Dec 31, 2024

@chrisjtan @evison 您好,我和P5开源的Sequence比较了一下,发现Beauty/Yelp的序列内部顺序是一样的(是按时间戳排序的),但是Toys/Sports只有一部分顺序一样,另外一部分的确存在#3 发现的问题,即Toys/Sports按照原始数据顺序而不是时间戳组织Sequence。
我担心这样的错误会进一步影响社区在Toys/Sports数据集上的进一步研究,希望您能更新更正后的数据集和对应的结果,十分感谢!

@evison
Copy link
Collaborator

evison commented Jan 2, 2025

Dear @jkdwxm, many thanks for pointing out this observation, indeed, in the original Amazon dataset, some sub-datasets are sorted by timestamp but some are not, maybe many people in the community did not notice this difference either. We will test on the sorted dataset, besides, if you or anyone else have results to share on the sorted dataset, welcome to make a post here. Thank you.

Dear @jkdwxm, 非常感谢您指出这一现象,Amazon的原始数据集确实有的按照时间排序了、而有的没有排序,可能领域内很多人也没有意识到这个现象。我们会在排序后的Amazon数据上测试一下,如果您或者任何人有结果可以分享,也欢迎在这里留言,谢谢。

@jkdwxm
Copy link
Author

jkdwxm commented Jan 2, 2025

十分感谢您的回复!我将Recbole预处理的Toys/Sports的Sequence和P5开源的Sequence对比了一下,确定P5和Recbole是一致的。
并且,P5开源了ID的Map,很容易就能处理出您用的user_sequence.txt文件,您可以参考一下。
再次感谢您的回复!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants