Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop basic script to create RO-Crates from B2Share #11

Open
juliancervos opened this issue Mar 11, 2024 · 3 comments
Open

Develop basic script to create RO-Crates from B2Share #11

juliancervos opened this issue Mar 11, 2024 · 3 comments
Assignees

Comments

@juliancervos
Copy link
Collaborator

Hi @stopopol, could you upload the script you wrote on B2Share to RO-Crate to this repo? Then we can continue the discussion and development here.

Background info

During a previous grassland pDT meeting, @stopopol proposed some ideas about uploading the data for the pDT to B2SHARE, and also how we could connect this with FDO and integrate with BioDT. Because LTER has a domain there, the metadata could be automatically harvested and we could add some metadata management to filter through BioDT and its pDTs, and similar ideas. For more info on this and the first steps we took, see the meeting page in the wiki.

@juliancervos juliancervos self-assigned this Mar 11, 2024
@stopopol
Copy link
Collaborator

@juliancervos
Copy link
Collaborator Author

No problem! I had a look and changed some things (see 336501e on the b2share_script branch). In particular, there are a couple of things about which I'd like to know what you think:

  • Moved crate = ROCrate() to the loop to store each record as in a separate RO-Crate directory (as separate datasets). Is that change okay, or did you want to keep all records under a single collection RO-Crate?
  • Following your suggestion in a past WP5 meeting, I've been looking into adding a property to specify which RI the dataset comes from. Some existing properties in Schema.org that could be fitting for that are sourceOrganization, producer publisher. For now, I added eLTER as the publisher of the datasets, but which one do you think that fits best?
  • I introduced a quick fix for datasets with the / character in its name. For example: "Grassland Dynamics - Matschertal/Val Mazia (Italy) - 2009-2010" would result in an RO-Crate root directory called "Grassland Dynamics - Matschertal" which would contain another nested directory named "Val Mazia (Italy) - 2009-2010".
    • So instead, with crate.write(crate.name.replace('/', '-')) the forward slash is converted to a dash before writing (the name is still unchanged in the metadata).
    • Not sure how frequent / characters are in site names, but there are other options to handle this. For example, using the suffix of the PID to name the directory, or the DEMIS ID and a timestamp, etc.

Let me know what you think @stopopol !

@stopopol
Copy link
Collaborator

Hi,

  • I'm completely ok with having separate RO-Crate directories
  • "publisher" seems like the most fitting option in the case of eLTER
  • the quick fix is perfectly fine. slashes are not commonly used in site names but it might occur every now and then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants