Crowdom

Crowdom is a tool for simplifying data labeling.

Write plain Python code and launch data labeling without knowledge of crowdsourcing and underlying platform (Crowdom uses Toloka as a platform for publishing tasks for workers). Define task you solve and load source data with few lines of code, choose quality-cost-speed tradeoff in interactive UI form, launch data labeling, study result labeling in Pandas dataframes.

Crowdom uses ʎzy, cloud workflow runtime, to run data labeling workflow. This provides reliability (automatic errors retry, possibility of data labeling relaunch without losing progress) and out-of-the-box data persistence.

Quickstart

We recommend you to look first at image classification example, since it demonstrates full data labeling workflow, proposed in Crowdom, with detailed explanations for each step.

In other examples, you can see how working with data labeling looks like for different types of tasks with use of Crowdom.

To get the benefits of running on ʎzy, see ʎzy setup example.

Join our Telegram chat if you want to learn more about the Crowdom or discuss your task with us.

Types of tasks

Tasks in Crowdom are divided into two types:

Classification tasks, which have a fixed set of labels as output.
Annotation tasks, for which output has "unlimited" dimension.

In a typical classification task, worker is proposed to make a choice of one of the pre-determined options. Side-by-side (SbS) comparison is a special case of classification task.

As for annotation task, there may be many potential solutions, and there may be more than one correct one. Speech transcription, image annotation are examples of annotation tasks.

Examples

The following table contains list of examples, which demonstrates data labeling for different types of tasks, as well as other aspects of data labeling workflow.

Examples are presented as .ipynb files, located in this repository, but displayed by nbviewer, which do it more precisely than GitHub.

Image classification and audio transcript) examples also have .html versions. These examples present full labeling workflow, corresponding two classification and annotation types of tasks respectively. .html allows to collapse optional sections in notebook to simplify understanding of main steps of workflow, as well as to display interactive widgets contents (for example, to display quality-cost-speed tradeoff interactive form).

Example	Full workflow	Function	Data types	Additionally
Image classification (HTML)	✅	Classification	Image
Audio transcript (HTML)	✅	Annotation	Audio, Text
Audio transcripts SbS		SbS	Audio, Text
Voice recording		Annotation	Text, Audio	Media output, checking annotations by the ML model
Audio transcript, extended		Annotation	Text, Audio	Custom task UI, custom task duration calculation, first annotations attempts by the ML model
MOS		Classification	Audio	MOS algorithm usage example
Audio questions		Classification	Audio	Output label set depending on the input data
Experts registration				Registration of your private expert workforce
Task update				Task update (instructions, UI and etc.)
ʎzy usage				ʎzy setup, parallel labelings

Communication

Join our communities if you have questions about Crowdom or want to discuss your data labeling task.

Telegram

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src		src
tests		tests
tests_e2e		tests_e2e
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowdom

Quickstart

Types of tasks

Examples

Communication

About

Releases

Packages

Languages

License

lambdazy/crowdom

Folders and files

Latest commit

History

Repository files navigation

Crowdom

Quickstart

Types of tasks

Examples

Communication

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages