BirdSet

Get Started

Devcontainer

You can use the devcontainer configured as as git submodule:

git submodule update --init --recursive

Install dependencies

Either with conda and pip.

conda create -n birdset python=3.10
pip install -e .

Or poetry.

poetry install
poetry shell

Minimal Working Example

Tutorial Notebook

Prepare Data

from birdset.datamodule.base_datamodule import DatasetConfig
from birdset.datamodule.birdset_datamodule import BirdSetDataModule

# initiate the data module
dm = BirdSetDataModule(
    dataset= DatasetConfig(
        data_dir='data_birdset/HSN', # specify your data directory!
        dataset_name='HSN',
        hf_path='DBD-research-group/BirdSet',
        hf_name='HSN',
        n_classes=21,
        n_workers=3,
        val_split=0.2,
        task="multilabel",
        classlimit=500,
        eventlimit=5,
        sampling_rate=32000,
    ),
)

# prepare the data (download dataset, ...)
dm.prepare_data()

# setup the dataloaders
dm.setup(stage="fit")

# get the dataloaders
train_loader = dm.train_dataloader()

Prepare Model and Start Training

from lightning import Trainer
min_epochs = 1
max_epochs = 5
trainer = Trainer(min_epochs=min_epochs, max_epochs=max_epochs, accelerator="gpu", devices=1)

from birdset.modules.base_module import BaseModule
model = BaseModule(
    len_trainset=dm.len_trainset,
    task=dm.task,
    batch_size=dm.train_batch_size,
    num_epochs=max_epochs)

trainer.fit(model, dm)

Results (AUROC)

_Title	_Notes	_PER	_NES	_UHH	_HSN	_NBP	_POW	_SSW	_SNE	_Overall	_Code
_{BirdSet: A Multi-Task Benchmark For Classification In Avian Bioacoustics}
_{BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics}		0.70	0.90	0.75	0.86		0.83	0.62	0.69

Logging

Logs will be written to Weights&Biases by default.

Background noise

To enhance model performance we mix in additional background noise from downloaded from the DCASE18. To download the files and convert them to the correct format, run the notebook 'download_background_noise.ipynb' in the 'notebooks' folder.

Run experiments

Our experiments are defined in the configs/experiment folder. To run an experiment, use the following command in the directory of the repository:

python birdset/train.py experiment="EXPERIMENT_PATH"

Replace EXPERIMENT_PATH with the path to the disired experiment YAML config originating from the experiment directory. For example, here's a command for training an EfficientNet on HSN:

python bridset/train.py experiment="local/HSN/efficientnet.yaml"

Project structure

This repository is inspired by the Yet Another Lightning Hydra Template.

├── configs                     <- Hydra configuration files
│   ├── callbacks               <- Callbacks configs
│   ├── datamodule              <- Datamodule configs
│   ├── debug                   <- Debugging configs
│   ├── experiment              <- Experiment configs
│   ├── extras                  <- Extra utilities configs
│   ├── hydra                   <- Hydra settings configs
│   ├── logger                  <- Logger configs
│   ├── module                  <- Module configs
│   ├── paths                   <- Project paths configs
│   ├── trainer                 <- Trainer configs
│   ├── transformations         <- Transformations / augmentation configs
│   |
│   ├── main.yaml               <- Main config
│
├── data_birdset                  <- Project data
├── dataset                     <- Code to build the BirdSet dataset
├── notebooks                   <- Jupyter notebooks.
│
├── birdset                         <- Source code
│   ├── augmentations           <- Augmentations
│   ├── callbacks               <- Additional callbacks
│   ├── datamodules             <- Lightning datamodules
│   ├── modules                 <- Lightning modules
│   ├── utils                   <- Utility scripts
│   │
│   ├── main.py                 <- Run experiments
│
├── .gitignore                  <- List of files ignored by git
├── pyproject.toml              <- Poetry project file
├── requirements.txt            <- File for installing python dependencies
├── requirements-dev.txt        <- File for installing python dev dependencies
├── setup.py                    <- File for installing project as a package
└── README.md

Data pipeline

Our datasets are shared via HuggingFace Datasets in our BirdSet repository. First log in to HuggingFace with:

huggingface-cli login

For a detailed guide to using the BirdSet data pipeline and its many configuration options, see our comprehensive BirdSet Data Pipeline Tutorial.

Datamodule

The datamodules are defined in birdset/datamodule and configurations are stored under configs/datamodule. base_datamodule is the main class that can be inherited for specific datasets. It is responsible for preparing the data in the function prepare_data and loading the data in the function setup. prepare_data downloads the dataset, applies preprocessing, creates validation splits and saves the data to disk. setup initiates the dataloaders and configures data transformations.

The following steps are performed in prepare_data:

Data is downloaded from HuggingFace Datasets _load_data
Data gets preprocessed with _preprocess_data
Data is split into train validation and test sets with _create_splits
Length of the dataset gets saved to access later
Data is saved to disk with _save_dataset_to_disk

The following steps are performed in setup:

Data is loaded from disk with _get_dataset in which the transforms are applied

Transformations

Data transformations are referred to data transformations that are applied to the data during training. They include e.g. augmentations. The transformations are added to the huggingface dataset with set_transform.

Name		Name	Last commit message	Last commit date
Latest commit History 637 Commits
.devcontainer @ 5adcc63		.devcontainer @ 5adcc63
birdset		birdset
configs		configs
notebooks		notebooks
resources/perch		resources/perch
.gitignore		.gitignore
.gitmodules		.gitmodules
.project-root		.project-root
README.md		README.md
poetry.lock		poetry.lock
pylintrc		pylintrc
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BirdSet

Get Started

Devcontainer

Install dependencies

Minimal Working Example

Prepare Data

Prepare Model and Start Training

Results (AUROC)

Logging

Background noise

Run experiments

Project structure

Data pipeline

Datamodule

Transformations

About

Releases

Packages

Languages

PariaValizadeh/BirdSet

Folders and files

Latest commit

History

Repository files navigation

BirdSet

Get Started

Devcontainer

Install dependencies

Minimal Working Example

Prepare Data

Prepare Model and Start Training

Results (AUROC)

Logging

Background noise

Run experiments

Project structure

Data pipeline

Datamodule

Transformations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages