Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-factor models module #102

Open
73 tasks
skasberger opened this issue Feb 16, 2021 · 4 comments
Open
73 tasks

Re-factor models module #102

skasberger opened this issue Feb 16, 2021 · 4 comments
Assignees
Labels
pkg:models models related activities prio:high status:confirmed Is a valid issue and will be moved forward soon. type:feature New feature

Comments

@skasberger
Copy link
Member

skasberger commented Feb 16, 2021

Re-factor the models module.

Goals are:

  • integrate it better with other tools
  • integrate it better with API module
  • implement setters/getters
  • Integrate with Pandas

Requirements

  • create Dataset with default metadatablocks when constructed
  • create Dataset with custom metadatablocks by passing mdb to Dataset construction
  • store internal information about metadatablock inside metadatablock object: mdb_type, mdb_version, date_created,
  • visualize data flow and architecture

ACTIONS

0. Pre-Requisites

1. Research

Design

Schema

  • required, data type, unique, limits, formats (email, date), minItems
  • controlledVocabulary: subject, authorIdentifierScheme, contributorType, country, journalArticleType, language, publicationIDType
  • pydataverse models
    • metadatablocks: mdb.citation?
    • make usage of mdb's easy
  • CSV
  • JSON
    • Dataverse Upload JSON default
    • Dataverse Download JSON default
    • DSpace
  • XML
    • DDI

Tools

Architecture

  • Idea: Create base class (ABC or normal), called BaseModel()

2. Plan

Prioritize

  • In:
    • Dataset Default Download JSON: import and export
    • Dataset Custom Download JSON: import and export
    • Dataset Upload JSON: import and export
    • CSV templates
  • Out:
    • DSpace
    • DDI XML
    • Custom JSON

3. Implement

  • Write tests
    • Integration tests
  • Write/Update code
    • Create base class (ABC or normal), called BaseModel()
    • Implement Dataverse Upload JSON mappings and schemas
    • Implement Dataverse Download JSON mappings and schemas
    • Prototype: CSV templates mappings and schemas
    • Prototype: DSpace mappings and schemas
    • Prototype: DDI XML mappings and schemas
  • Write/Update Docs
  • Write/Update Docstrings
  • Run pytest
  • Run tox
  • Run pylint
  • Run mypy

visualize data flow / Architecture

draw all functions, paths etc.

models.py

class Dataverse():
  .__created_at
  .json
  .validate
  .json()
  .metadata()
  .metadata
  .__get_dataverse_download_json()
  .__validate()

class Dataset():
  .__created_at
  .json
  .validate
  .json()
  .mdb()
  .mdbs
  .__get_dataverse_download_json()
  .__validate()

class Datafile():
  .__created_at
  .json
  .validate
  .json()
  .metadata()
  .metadata
  .dataframe
  .__get_dataverse_download_json()
  .__validate()

class BaseMetaDataBlock(ABC):
  self.__mdb_type = "custom" # options: `citation`, `journal` etc and `custom`
  self.__mdb_version = "4.18.1" # options: for Dataverse mdb types = Dataverse Version in semantic string, for custom a custom versioning in semantic versioning string.
  self.__mdb_date_created = datetime.now() # options: for Dataverse mdb types = Dataverse Version in semantic string, for custom a custom versioning in semantic versioning string.

class MetaDataBlock(BaseMetaDataBlock():
  .__created_at
  .__name

class MetaDataBlockEntry():
  .__created_at
  .__value
  .__multiple
  .__type_class
  .__class

class Roles();
  .__created_at

class Group():
  .__created_at

class User():
  .__created_at

4. Follow Ups

  • Review
    • Code
    • Tests
    • Docs
@JR-1991
Copy link
Member

JR-1991 commented Mar 15, 2023

Just discovered this issue and the idea seems to align very well to what has been done with EasyDataverse already. The library also utilizes PyDantic and generates objects according to the metadatablock schemes found at api/metadatablocks/blockname.

Wouldnt it make sense to merge the functionality into PyDataverse? In my opinion having a single Python library makes more sense since both are heading in the same direction. What do you think @skasberger @pdurbin @poikilotherm?

@pdurbin
Copy link
Member

pdurbin commented Mar 15, 2023

I think a single library would be easier for the community, sure.

@skasberger
Copy link
Member Author

Agree on that, especially as pyDataverse is very lightweight and is made to build upon other, more specialized services/functions.

@pdurbin
Copy link
Member

pdurbin commented Mar 4, 2024

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

@pdurbin pdurbin removed this from the v0.5.0 milestone Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:models models related activities prio:high status:confirmed Is a valid issue and will be moved forward soon. type:feature New feature
Projects
None yet
Development

No branches or pull requests

3 participants