iguanas-from-above-zooniverse

Process to cluster marks set by Volunteers on zooniverse

Installation

Python 3.9, 3.10, 3.11 are tested. To install the required packages, run the following command:

pip install -r requirements.txt

If the installation doesn't work, try to install the packages as they are install by github:

pip install -r requirements-dev.txt

installing panoptes aggregation

# https://aggregation-caesar.zooniverse.org/README.html
# pip install panoptes_aggregation # this fails because the hdbscan cannot be built.

pip install -U git+https://github.com/zooniverse/aggregation-for-caesar.git

Usage

The process is split in two steps. The first is extracting a flat datastructure using the panoptes aggregation package from zooniverse. This data prep is bundled in this Notebook Panoptes Data Prep. These require the classification report "iguanas-from-above-classifications.csv" and the subjects export "iguanas-from-above-subjects.csv". An alternative was developed using a custom iterator 010_zooniverse_data_prep.

The Notebook Zooniverse_Clustering illustrates the process to cluster the marks set by volunteers on zooniverse. The necessary data from the previous step is in the data folder. The file "flat_panoptes_points_[phase]" are the point marks in a flat table structure. "panoptes_questions_[phase]" contains the Yes/No Answers by the volunteers.

Run jupyterlab first via

jupyter lab

It requires some files defined in the config.py file. They are relative to the input_path, so if the file "iguanas-from-above-classifications.csv" is located at "/User/ABC/IguanasFromAbove/2023-10-15/iguanas-from-above-classifications.csv" the input_path needs to be /User/ABC but the config is set.

from pathlib import Path

def get_config(phase_tag, input_path, output_path=None):
    configs = {}
    if output_path is None:
        output_path = input / Path("current_analysis").joinpath(phase_tag)
    configs["Iguanas 1st launch"] = {
        # classifications downloaded from zooniverse
        "annotations_source": input_path.joinpath("IguanasFromAbove/2023-10-15/iguanas-from-above-classifications.csv"),
    
        # gold standard datatable with the expert count, used for filtering the dataset
        "goldstandard_data": input_path / Path(
            "Images/Zooniverse_Goldstandard_images/expert-GS-1stphase.csv"),
    
        # which images/subject ids to consider. filters the data.
        "gold_standard_image_subset":
            input_path.joinpath("Images/Zooniverse_Goldstandard_images/1-T2-GS-results-5th-0s.csv"),
    
        # images for plot on them
        "image_source": input_path.joinpath("Images/Zooniverse_Goldstandard_images/1st launch"),
    
        }

While "annotations_source" is the zooniverse classification export, goldstandard_data and gold_standard_image_subset are used for filtering need to contain the subject_id of the images.

1-T2-GS-results-5th-0s.csv

subject_id	Median0s	Mean0s	Max0s	Std0s	Median.r	Mean.r	Mode0s
47967876	1	1.444444444	3	0.726483157	1	1	1
47967959	1	1.181818182	2	0.404519917	1	1	1
47967961	9	9	12	2.581988897	9	9	12

expert-GS-1stphase.csv

subspecies	island	site_name	subject_group	image_name	subject_id	presence_absence	count_male-no-lek	count_others	count_partial	count_total	quality	condition	comment
A. c. trillmichi	Santa Fe	El Miedo	SFM1	SFM01-2-2-2_282.jpg	47969795	Y	2	0	2	2	Good	Hard
A. c. trillmichi	Santa Fe	El Miedo	SFM1	SFM01-2-2-1_344.jpg	47969531	Y	2	2	1	4	Good	Hard	not consider number 4 marked in the image
A. c. trillmichi	Santa Fe	El Miedo	SFM1	SFM01-2-2-2_270.jpg	47969760	Y	0	0	1	0	Good	Hard

It results in csv files with the clustering results and images with the marks and the clusters. The method_comparison.csv file contains the comparison between the clustering methods per image.

image_name	subject_id	count_total	median_count	mean_count	mode_count	users	sum_annotations_count	annotations_count	dbscan_count_sil	HDBSCAN_count
EGI08-2_78.jpg	72333835	1	1.0	1.00	1	12	12	[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]	1	1
FMO03-1_65.jpg	72338628	5	4.0	3.42	4	19	65	[1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, ...	4	4
FMO03-1_72.jpg	72338635	4	3.0	2.65	4	20	53	[1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, ...	3	4

Examples

Running the notebook requires setting some variables

from pathlib import Path

input_path =Path("/Users/christian/data/zooniverse")

reprocess = True # if True, the raw classification data is reprocessed. If False, the data is loaded from disk

# Phase Selection
phase_tag = "Iguanas 1st launch"
# phase_tag = "Iguanas 2nd launch"
# phase_tag = "Iguanas 3rd launch"

debug = False # debugging with a smaller dataset
plot_diagrams = False # plot the diagrams to disk for the clustering methods
show_plots = False # show the plots in the notebook
user_threshold = None # in a number, filter records which have less than these user interactions.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
data		data
images		images
scripts		scripts
tests		tests
zooniverse		zooniverse
.flake8		.flake8
Error_calculation.ipynb		Error_calculation.ipynb
Panoptes_Data_Prep.ipynb		Panoptes_Data_Prep.ipynb
README.md		README.md
Workflow_comparison.ipynb		Workflow_comparison.ipynb
Zooniverse_Clustering.ipynb		Zooniverse_Clustering.ipynb
Zooniverse_Clustering_all.ipynb		Zooniverse_Clustering_all.ipynb
Zooniverse_Clustering_all_panoptics.ipynb		Zooniverse_Clustering_all_panoptics.ipynb
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iguanas-from-above-zooniverse

Installation

installing panoptes aggregation

Usage

Examples

Example 1

Example 2

About

Releases

Packages

Contributors 2

Languages

cwinkelmann/iguanas-from-above-zooniverse

Folders and files

Latest commit

History

Repository files navigation

iguanas-from-above-zooniverse

Installation

installing panoptes aggregation

Usage

Examples

Example 1

Example 2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages