This repository contains data files for bulk submission to common voice for swahili data collected as part of the Lacuna language datasets funded project.
The data can be found in the data folder organised in batches for different submissions.
Each data batch is organised under a floder named with a number and the date when it was created. The batch folders contain the dataset of validated sentences in xlsx or txt format and the validation files in a separate folder.
The validation files should preferably be named by the validator's name and links to each file must be recorded in the validation report submitted when making pull requests.
Each Batch has a description of the data with details about the source and the licensing.