This repository holds all code produced by the VIP dolphin acoustics group that are relevant to the generation of datasets for machine learning. As of April 9th 2022, the group's main focus is to use spectrograms of waveform clips to train their models. But in the future this repository can be expanded to contain preprocessing methods on waveform data files.
We used jupyter notebooks as our working environments for the code. If you type githubtocolab in place of github in the url of the jupyter notebook you are currently reading, you can automatically jump to google colab.
The easiest way to set up and run the jupyter notebook files in this folder is by creating a virtual environment for this project. This way you can avoid the so-called "dependency-hell". For a tutorial head to this link. But you can obviously choose your own preferred way of getting everything set up, just know that it will be more work.
The Python version used in this project is Python 3.9.10
Each file is self contained, and they don't require any extra modules to import or download apart from those pippable in the requirements.txt file.
The code here is fully os-independent, so you don't have to worry whether you are running on a windows os or IOS or linux, they will all work. But do make sure you set up a virtual environment correctly before continuing.
Quick overview:
dataset_generator.ipynb: This jupyter notebook is for everything related to generating spectrograms:
- finding clips in the given input path
- generating and saving spectrograms to output path
- splitting of the generated spectrograms dataset into training and testing folders
demo_gen_spectrograms.ipynb: This jupyter notebook is a sort of playground for testing and playing with different librosa functions and normalisation techniques on a sample waveform clip. This is not production ready code. A sample.wav has been provided in this repo for testing and experimentation, but the user can play with their own waveform data as well.
Normalized Bottlenose Wavclip
Normalized Common Wavclip
Normalized Melon-headed Wavclip