Releases: barthoekstra/brc-data-preprocessor
Preprocessor for #BRC15 (2023)
Changelog
Changes since 2021.1.
- Added an extra copy of the checked data that gets added to an
inprogress-backup
folder on Dropbox, so data technicians can keep track of how data cleaning steps are taken. - The function first used a .zip deployment archive, but this has now been replaced with a Dockerized approach.
Preprocessor for #BRC13 (2021)
Changelog
Changes since 2019.1.
Non-Juv SteppeE
andNon-Juv ImperialE
will now be flagged as unexpected records.WhitePel
andDalPel
will be renamed toWhiteP
andDalmatianP
in line with GBIF dataset and data paper.- Extraction of count times from Trektellen is made more robust by removing whitespace in regex search.
- A ‘single station count’ mode is added for the spring counts, which can be activated by setting environment variable
SINGLE_STATION_COUNT=yes
. - Removed package versions from
requirements.txt
.
First preprocessor version for #BRC12 (2019)
The first version of the preprocessor, prepared for the #BRC12 2019 season. Code leading up to this release has been improved based on feedback by previous coordinators and data technicians. Changes in new releases of the preprocessor will be documented in a CHANGELOG
file.
General workflow
The preprocessor runs on Amazon Lambda and regularly checks the Trektellen site for newly uploaded BRC counts. If both stations have uploaded data for the day, the fetcher will download the data and store a raw version of the data in Dropbox (in 2019/data/raw
). The preprocessor subsequently checks a copy of the raw data for all kinds of possible errors and flags them by adding a description of the potential problem to a check
column in the file stored in 2019/data/inprogress
. It is then up to coordinators to use their experience and knowledge of the migration during a given day to determine the validity of the flags added by the preprocessor and act accordingly. Once they have dealt with these issues and emptied the check
column of flags, the file can be moved to 2019/data/clean
.
Flagged records
The following records will be flagged by the preprocessor:
- Records with invalid doublecount entries (e.g. not within 10 minutes or with the wrong distance code).
- Records containing >1 bird that is injured and/or killed (rare occurrence).
- Records lacking critical information in
datetime
,telpost
,speciesname
,count
orlocation
columns (very unlikely, but the possible result of a bug). - Records of birds in >E3 (rare occurrence).
- Records with registered morphs for all species other than Booted Eagles (and Eleonora's Falcons).
- Records of
HB_NONJUV
,HB_JUV
,BK_NONJUV
andBK_JUV
if the number of aged birds is higher than the number of counted birds (HB
andBK
) within a 10-minute window around the age record. - Records of Honey Buzzards that should probably be single-counted (at Station 2 during the HB focus period).
- Records of aged Honey Buzzards and Black Kites outside of expected distance codes (i.e. outside of W1-O-E1).
- Records containing unexpected combinations of sex and/or age information.
- Records with no timestamps, which are set to 00:00:00 during processing.
- Records containing non-protocol species.
- Records with age details in
W3
,E3
and>E3
, excluding non-juvenile harriers with a sex, juvenileMonPalHen
and juvenile/non-juvenile eagles. - Records of female Pallid Harriers with
I
orA
age (legal per protocol, though very difficult to age in the field).