Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Discovery about integrating biomedical vocabularies into dataverse #8681

Closed
2 tasks
mreekie opened this issue May 9, 2022 · 13 comments
Closed
2 tasks
Labels
Feature: Controlled Vocabulary Includes both Internal and external controlled vocabularies NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc

Comments

@mreekie
Copy link

mreekie commented May 9, 2022

The objective is figuring out how to integrate biomedical vocabularies into dataverse.
Three in particular have been discussed.

This is probably not a coding exercise. The team needs to figure out which metadata fields for biomedical will need to be included and where.

Definition of done:

  • For the purposes of this spike, the work will be limited by a 2 week timeline.
    • Keep in contact with Len as work progresses.
  • What would it take to enable a meta-data field on one of our forms for creating a dataset or datacollection, that accepts the contents of these these three vocabularies for the choices on the fields.
  • This is a POC to show that we can interact with these third party vocabularies.
  • Dataverse can work with external vocabularies.
  • Can it work with these three?

Context:

  • Needs a design session to start.
  • Opportunity to talk to Julie Goldman
  • Sonia asked that Julian be involved as lead
@mreekie mreekie added NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons spike labels May 9, 2022
@mreekie mreekie added Feature: Controlled Vocabulary Includes both Internal and external controlled vocabularies pm.Len labels May 9, 2022
@mreekie
Copy link
Author

mreekie commented May 9, 2022

Priority meeting:

  • Created during today's priority meeting as part of the work for the NIH grant.
  • Len prioritized this as high

@mreekie
Copy link
Author

mreekie commented May 11, 2022

Sprint

  • This is going work in the same manner as the harvesting spike.
  • Needs digging into the three different vocabularies.
  • determine if these are internal or external
  • What might be required to implement this?

Internal People that may have knowledge:

  • Julie Goldman. Can help us get started or help get us more contacts.
  • Len has another contact that he can provide.
  • Lisa Federer may also be able to help. May be able to get us in touch with more people at the NIH
  • Another contact: Slava
  • Mahmood Shad from Harvard Data Common's Objective 2 may have further contacts through the objective 2 group work.
  • Some of the people listed in this "List of Metadata Experts / Stakeholders Consulted" document might be helpful. For example, Tim Clark reviewed the current biomedical metadatablock when it was being designed.

From Len

  • The coopetition meetings seem to indicate that the NIH expectations are realistic.
  • We can start somewhere small and go from there.?

First step:

  • Finding someone knowledgeable will be the first step in getting us going to help sort out what is most relevant.

@mreekie
Copy link
Author

mreekie commented May 11, 2022

Phil

@mreekie
Copy link
Author

mreekie commented May 11, 2022

Kevin notes on what we've done with this before.

  • hierarchy came up.
  • we have tried mapping to DDI notes.
  • Leonid - indexing?
  • export affected by that.

@mreekie
Copy link
Author

mreekie commented May 11, 2022

Sprint

  • Medium is sized.
  • Check in with Len over the sprint.

@4tikhonov
Copy link
Contributor

Hi @mreekie, I think the biggest problem is hierarchy support in metadata, other issues we more or less covered during the implementation of the external CV support. Dataverse interface should be definitely extended with hierarchy support. However we've tried to solve this with introduction of SKOS relationships between fields, please check here: https://zenodo.org/record/5838132#.YoKTVS8Rol4

@mreekie
Copy link
Author

mreekie commented May 16, 2022

Hi @mreekie, I think the biggest problem is hierarchy support in metadata, other issues we more or less covered during the implementation of the external CV support. Dataverse interface should be definitely extended with hierarchy support. However we've tried to solve this with introduction of SKOS relationships between fields, please check here: https://zenodo.org/record/5838132#.YoKTVS8Rol4

@4tikhonov I appreciate this. I'm not the right person to pick up this discussion on a technical level. I do know that the problem of hierarchy support in metadata came up during the sizing discussion. I will follow-up and make sure that we reach out to you when this gets picked up and put in progress.

@mreekie
Copy link
Author

mreekie commented May 25, 2022

Sprint

  • orphaned in OnDeck in pm.sprint.2022_05_11

@mreekie
Copy link
Author

mreekie commented Jun 8, 2022

Sprint:

  • pm.sprint.2022_05_25 ended OnDeck

@mreekie
Copy link
Author

mreekie commented Sep 22, 2022

This needs the right skillset
This may need to be split.

  • What are the vocabularies
    • If they are deeply hierarchical? if so thenwe will have trouble supporting them out of the box.

Next step:

  • Devs are thewrong place to start with this.
  • Start with a post to the list.
  • Call a meeting or dedicate a community call.

@mreekie
Copy link
Author

mreekie commented Sep 26, 2022

Reach out to Julie works at the Medical School as a reasearch data librarian.
Mahmood may also be a good bridge person.

Where we are at is that we have 3 controlled vocabularies identified.

  • How hard are these to plug in if they are already there.

@mreekie
Copy link
Author

mreekie commented Sep 26, 2022

Phil has done a demo of how to use external controlled vocabularies.
Leonid has also done work - the earlier spike on external controlled vocabluaries.
The community had already implemented this.
We evaluated (leonid?) and it seems to work fine.

This may also tie into the use cases that Mahmood has worked on.

@mreekie
Copy link
Author

mreekie commented Oct 18, 2022

It looks like this issue can be closed.
After our first meeting on the problem definition, we decided to go with three specific issues to get started.
We don't need to link this issue specifically, because all of the issues are linked to the deliverable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Controlled Vocabulary Includes both Internal and external controlled vocabularies NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc
Projects
None yet
Development

No branches or pull requests

2 participants