load torch tensors in OGBDatasets #107

CarloLucibello · 2022-03-27T15:57:17Z

Some of the features of the OGBDataset are downloaded as torch tensor stored in the ".pt" format. They are currently ignored at the moment, but we could load them using Pickle.jl (e.g. see this comment)

Dsantra92 · 2022-04-18T15:51:18Z

Can I work on this?

CarloLucibello · 2022-04-19T06:54:45Z

Sure. I don't remember for which specific dataset this was needed though

CarloLucibello · 2022-05-08T09:38:00Z

This problem can be seen in OGBDataset("ogbl-collab")

Dsantra92 · 2022-05-15T06:44:02Z

Been inactive due to Uni. exams, will start working on it today.

yuehhua · 2022-05-21T02:59:42Z

Some problems have been overcome here, including loading ".pt" format using Pickle.jl and have been discussed with @chengchingwen : https://github.com/yuehhua/GraphMLDatasets.jl/blob/65d6a2bb02d31569a64b47004a0c4b192739a066/src/preprocess.jl#L391
Hope these code help.

Dsantra92 · 2022-06-13T17:00:10Z

Split tensors appear for edge-level tasks in OGB Datasets. The dataset loading for LinkPropped Datasets differs from GraphPropped or NodePropped. We might need a change of OGB-Dataset APIs.
Here are some approaches:

Mention the split of the dataset

data = OGBDataset(name, split; dir)

But this has one obvious problem: loading any split eg. train would involve computation of the other two splits (val and test) given the intertwined nature of how the data is stored.

Return train, test and validation split for each dataset

train_data, test_data, valid_data = OGBDataset(name; dir)

Can be ambiguous for non-split datasets and does not exactly match with other dataset APIs.

Compute split from dataset

data = OGBDataset(name; dir)
train_split = split(data, :train) # this may weird way to do
# maybe something like
train_split = data[:train]

Representation for link tasks in OGBDataset will differ from Node or Graph tasks.

Dsantra92 · 2022-07-11T20:10:27Z

Also, API for splits should be consistent for different data sources. eg: Cora and OGBDataset access training masks using different APIs.

CarloLucibello added the gsoc label May 20, 2022

Dsantra92 mentioned this issue Jun 14, 2022

Support for OGB Hetero-Graphs #140

Merged

Dsantra92 mentioned this issue Sep 3, 2022

Split for OGBDatasets #172

Merged

Dsantra92 linked a pull request Sep 3, 2022 that will close this issue

Split for OGBDatasets #172

Merged

CarloLucibello closed this as completed in #172 Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load torch tensors in OGBDatasets #107

load torch tensors in OGBDatasets #107

CarloLucibello commented Mar 27, 2022

Dsantra92 commented Apr 18, 2022

CarloLucibello commented Apr 19, 2022

CarloLucibello commented May 8, 2022

Dsantra92 commented May 15, 2022

yuehhua commented May 21, 2022

Dsantra92 commented Jun 13, 2022 •

edited

Loading

Dsantra92 commented Jul 11, 2022

load torch tensors in OGBDatasets #107

load torch tensors in OGBDatasets #107

Comments

CarloLucibello commented Mar 27, 2022

Dsantra92 commented Apr 18, 2022

CarloLucibello commented Apr 19, 2022

CarloLucibello commented May 8, 2022

Dsantra92 commented May 15, 2022

yuehhua commented May 21, 2022

Dsantra92 commented Jun 13, 2022 • edited Loading

Dsantra92 commented Jul 11, 2022

Dsantra92 commented Jun 13, 2022 •

edited

Loading