Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for MEDISCO #591

Closed
SamuelCahyawijaya opened this issue Apr 1, 2024 · 4 comments · Fixed by #654
Closed

Create dataset loader for MEDISCO #591

SamuelCahyawijaya opened this issue Apr 1, 2024 · 4 comments · Fixed by #654
Assignees
Labels
pr-ready A PR that closes this issue is Ready to be reviewed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: medisco/medisco.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?medisco

Dataset medisco
Description MEDISCO is a Medical Indonesian Speech Corpus. The medical text corpus is collected from five Indonesian online medical consultation websites. From the text corpus, we created a speech corpus that consists of 360 sentences read by 13 speakers. In total, our speech corpus contains 731 medical terms and consists of 4,680 utterances with a total duration of 10 hours.
Subsets Train, Test
Languages ind
Tasks Automatic Speech Recognition
License GNU General Public License v3.0 (gpl-3.0)
Homepage https://huggingface.co/datasets/mrqorib/MEDISCO
HF URL https://huggingface.co/datasets/mrqorib/MEDISCO
Paper URL https://ieeexplore.ieee.org/abstract/document/8629259
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Apr 1, 2024
@akhdanfadh
Copy link
Collaborator

#self-assign

@mrqorib
Copy link
Contributor

mrqorib commented Apr 1, 2024

#self-assign

@akhdanfadh Sorry would you mind giving this to me? This is my dataset 😆

@akhdanfadh
Copy link
Collaborator

Sure! @mrqorib

@akhdanfadh akhdanfadh assigned mrqorib and unassigned akhdanfadh Apr 1, 2024
@mrqorib
Copy link
Contributor

mrqorib commented Apr 1, 2024

@akhdanfadh Thanks! 😊

@holylovenia holylovenia added pr-ready A PR that closes this issue is Ready to be reviewed and removed staled-issue labels May 2, 2024
MJonibek pushed a commit that referenced this issue May 15, 2024
* add medisco dataloader

* fix example id to make it unique

* Update seacrowd/sea_datasets/medisco/medisco.py

Co-authored-by: Lj Miranda <[email protected]>

* fix formatting

---------

Co-authored-by: Lj Miranda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-ready A PR that closes this issue is Ready to be reviewed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants