Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project: Kaggle (Croissant) #163

Open
5 of 12 tasks
cmbz opened this issue Jan 21, 2024 · 5 comments
Open
5 of 12 tasks

Project: Kaggle (Croissant) #163

cmbz opened this issue Jan 21, 2024 · 5 comments
Assignees
Labels
Dataverse Project Issues related to Dataverse Project software

Comments

@cmbz
Copy link
Contributor

cmbz commented Jan 21, 2024

Overview

  • "Croissant is a high-level format for machine learning datasets that brings together four rich layers."
  • This issue tracks activities related to our collaboration with the Kaggle Team related to Croissant.

Mission

  • Make datasets easier to find and work with for Machine Learning, at scale and by diverse stakeholders (e.g. AI engineers, AI ethicists[e], compliance managers, interested public)

Vision

  • Croissant is the most convenient and widely used machine-readable format for ML-ready datasets.

Issues

Issues we will probably work on

Issues we've opened or are keeping an eye on

Depending on the outcome of these issues, we may enhance our Croissant implementation to cover additional use cases.

Related

@cmbz cmbz self-assigned this Jan 21, 2024
@cmbz cmbz changed the title Project: Kaggle (Croissant) Collaboration: Kaggle (Croissant) Jan 21, 2024
@cmbz cmbz changed the title Collaboration: Kaggle (Croissant) Project: Kaggle (Croissant) Jan 21, 2024
@cmbz cmbz added the Dataverse Project Issues related to Dataverse Project software label Jan 21, 2024
@cmbz
Copy link
Contributor Author

cmbz commented Jul 22, 2024

Status: July 2024

  • Merged the Croissant support PR.
  • Planned to announce support.
  • Met with Kaggle and talked through some options for integration.

@cmbz
Copy link
Contributor Author

cmbz commented Aug 22, 2024

Status: August 2024

@cmbz
Copy link
Contributor Author

cmbz commented Oct 8, 2024

Status: September 2024

  • Now that the Croissant exporter is in place and being indexed by Google, Harvard Dataverse is showing Croissant data in Google Dataset Search, as described in a mailing list post with a screenshot.
  • Emails exchanged with Croissant Task Force about file formats and DDI-CDI.

@cmbz
Copy link
Contributor Author

cmbz commented Nov 3, 2024

Status: October 2024

  • Dataverse was added to the official image of Croissant integrations.
  • Arofan Gregory from CODATA gave a talk about DDI-CDI to the Croissant working group.

@cmbz
Copy link
Contributor Author

cmbz commented Nov 22, 2024

Status: November 2024

  • Discussions were initiated between CODATA team and Dataverse and Community participants to investigate synergies between Croissant and DDI-CD formats and coordinate their respective efforts in these areas. Efforts are aligned with GREI 2: Consistent Metadata objectives.
  • A meeting is planned for the upcoming months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dataverse Project Issues related to Dataverse Project software
Projects
None yet
Development

No branches or pull requests

1 participant