-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
want to pretrain on my own datasets #21
Comments
Hi: You can simply follow the similar paradigm of prepreocess_.py to parse your graph into our data formula, and then just run pretrain_.py over that parsed graph. Or, if you want to merge our code into your own system, maybe you can rewrite the data structure, but everything else is similar. |
Thank you very much for your advice, I will try it~ |
My own dataset contains more than 10 million nodes. I see in the paper that OAG dataset contains more than 178 million nodes, but I just find out that only about 1 million nodes are used for pretraining according to the pretrain_OAG.py, is that number right? |
Since we utilize subgraph sampling during training, the size of the pretraining graph is not that matter. In experiments, I also try on the whole OAG dataset, but it's too big so I didn't provide it in google drive. But obviously, you can use our code to do pretraining on a super-large dataset. |
Thank you for your patient reply~I will try it. |
Hi, acbull~ I think this algorithm is very interesting and I really want to test on my own graph dataset. It is there any advice or tips on how to prepare my own pretrain graph data? Thank you very much~~
The text was updated successfully, but these errors were encountered: