This dataset was generated from the data collected from various users of the yogera mobile app. The dataset includes the metadata and the link to the to voice clips. This dataset consists of 5 different languages (Luganda, Lusoga, Lumasaba, Acholi and Runyankole-Rukiga)
The latest dataset release (version 5.0.2) combines all collected data. This includes Phase 1.0 data (from August to December 2023), Phase 1.1 data (January to February 2024), Phase 2.0 data (from April to August 2024) and Phase 3.0 data (from October to December 10 2024). This dataset split is well represented in the metadata (inclusion of the phase column)
Version | Date Released | Voice Clips | Recorded hours | Approved Hours | Unique Voices | Transcribed | Reviewed |
---|---|---|---|---|---|---|---|
5.0.2 | Dec 10, 2024 | Link | 3,885.9 | 2,656.2 | 3,024 | 253.2 | 251.7 |
5.0.1 | Nov 20, 2024 | Link | 3,411.1 | 2,217.7 | 2,675 | 253.2 | 251.7 |
4.0.1 | Aug 13, 2024 | Link | 2,253.3 | 1,565.8 | 1,641 | 152.8 | 151.3 |
4.0.0 | Aug 07, 2024 | Link | 2,166.8 | 1,478.1 | 1,585 | 152.8 | 151.3 |
3.0.1 | Feb 07, 2024 | Link | 844.0 | 509.4 | 479 | 58.0 | 53.4 |
3.0.0 | Jan 18, 2024 | Link | 682.9 | 334.0 | 440 | 58.0 | 53.4 |
2.0.0 | Oct 31, 2023 | Link | 511.3 | 17.2 | 312 | 0 | 0 |
1.0.0 | Sept 20, 2023 | Link | 43 | 4 | 34 | 0 | 0 |