Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flagging and filtering out bad quality IFCB samples #4

Open
mdugenne opened this issue Jun 3, 2022 · 1 comment
Open

Flagging and filtering out bad quality IFCB samples #4

mdugenne opened this issue Jun 3, 2022 · 1 comment

Comments

@mdugenne
Copy link
Collaborator

mdugenne commented Jun 3, 2022

Known issues (leak, air bubbles, camera focus, and settings) with the IFCB instrument may alter the detection of plankton and result in inaccurate count and/or size estimates, affecting the observed size distribution and predicted slope of the size spectrum.

Given that (1) IFCB projects uploaded on Ecotaxa have not been flagged, (2) only a small percentage of IFCB projects have been validated (3) some predicted artefacts are actually good quality images of plankton, is there a reliable way to flag and filter out bad quality IFCB samples?

Few ideas: Within the possible artefacts (e.g. bubbles, beads, badfocus), bubbles are likely the best predicted, followed by beads.

@MarCorralesU
Copy link
Collaborator

considering that we don't want to get rid of data (rows) for the "cleaned" version of the tsv files, this part of the file reading function

# FILTERING DATA: filter taxonomic categories that are artefacts
data_clean = df[df['object_annotation_category'].str.contains("artefact") == False]
# remove data points where size metrics = 0
data_clean = data_clean[data_clean.object_equivdiameter != 0]
# finally, remove any row with coordinates =nan
data_clean = data_clean[data_clean['object_lat'].notna()]
data_clean = data_clean[data_clean['object_lon'].notna()]
data_clean = data_clean.reset_index()

needs to be included after the data is read and standardized but before the data is used for PSS analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants