Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assessing Caitlin's feedback #9

Open
9 of 15 tasks
jackVanish opened this issue Feb 9, 2024 · 17 comments
Open
9 of 15 tasks

Assessing Caitlin's feedback #9

jackVanish opened this issue Feb 9, 2024 · 17 comments
Assignees

Comments

@jackVanish
Copy link
Contributor

jackVanish commented Feb 9, 2024

Caitlin helpfully checked out a set of surimi output- OTN detection extract to IMOS tripartite format, then from that same format to OTN tripartite format. She helped identify some of the places where the process loses data, so I've taken her feedback and assembled it into this checklist so I can keep track of the changes I need to make/have made. Her messages are first, followed by my notes in parentheses. A note- anything that can go from a detExtract to a tripartite file will ALSO have to be accounted for if someone HAS the tripartite OTN file. i.e, anything I fix in the 'derive' functions must be accounted for in the main function as well.

OTN DetExtract -> IMOS detections

  • worms aphia ID can prob be found via the WORMS API? (Will have to write new functionality to grab the WORMS aphia ID).
  • reciever name and receiver project name can likely be filled in - wondering if these are receiver SN (or station?) and receiver
  • project code (ie: detectedby), would have to look at a proper imos example file to check (Got to get one of those to Caitlin)

OTN DetExtract -> IMOS receivers

  • anything where receiver = release are not detections from receivers. these are release records from when we tagged each shark. not sure if these should be in the receiver metadata file? (Should they be? Must find out).
  • receiver project name should be the collectioncode (same as receiver_group) (Fairly straightforward)

OTN DetExtract -> IMOS tags

  • tagging_project_name should be "NSBS" (or in general the collection code)
  • transmitter_deployment_locality, lat , long and datetime can all use the "release" record for the animal to determine its tagging locations (Right on right on)

IMOS detections -> OTN detections

  • generally, this is a very diffferent format than IMOS detections. do u remember why we picked this one? to match OTNs raw detection tables? (Good question, must formulate a complete answer).
  • lost info: receiver (this should be the SN, called receiver_id in imos format)
  • lost info: sensor information. these are in the sensorXXX columns in imos format
  • lost info: station name. this is in station_name in imos format, if we want to keep this info

IMOS receivers -> OTN receivers

  • this is the same format as the IMOS stuff - not otn database format, but likely what remora wants? (Will have to follow up about this).
  • same deal as IMOS receivers - do we want the releases in there?

IMOS tags -> OTN tags

  • this is the same format as the IMOS stuff - not otn database format, but likely what remora wants?
  • same issues w missing info as the IMOS tags
@jackVanish jackVanish self-assigned this Feb 9, 2024
@jackVanish
Copy link
Contributor Author

Regarding the worms aphiaID- I wrote a quick helper function to get the aphiaID from the sciname before Jon correctly pointed out that the worrms library does exactly that, so that's what I'm using now. I still have to write a couple of little helper functions around it, because plugging it into the mutate functions means we're doing a lot of queries that can bloat the code's runtime. So I'm making a little lookup table out of the unique scinames/aphiaids and then we can just do each scientific name query once, building the columns in Surimi out of the lookup table.

@jackVanish
Copy link
Contributor Author

jackVanish commented Feb 14, 2024

Code to get and add the aphia ID added; going to write in the comments and packagify it before I merge it.

@jackVanish
Copy link
Contributor Author

Added code to handle assigning receiver_project_name and tagging_project_name. The code runs and seems to me to act correctly, we will check out the correctness when we run the second set of tests after these issues are resolved.

@jackVanish
Copy link
Contributor Author

Took the releases out of the detections dataframe before deriving the receivers from it.

@jackVanish
Copy link
Contributor Author

Just checked and it seems like we are already using release lat/lon/and datetime to fill in the appropriate columns, so we just need to handle locality.

@jackVanish
Copy link
Contributor Author

While closing Remora tickets I found this:
IMOS.metadata.mapping (2).xlsx

Looks like we decided on the mapping some time ago- receiver_name and receiver_project_name can be mapped to receiver and otn_array respectively.

@jackVanish
Copy link
Contributor Author

Screen Shot 2024-03-13 at 9 21 53 AM While following up I found this, with the attached comment, which is why receiver was blank. @CaitlinBate can you weigh in on this one?

@naomitress
Copy link

Screen Shot 2024-03-13 at 9 21 53 AM While following up I found this, with the attached comment, which is why receiver was blank. @CaitlinBate can you weigh in on this one?

@jackVanish do you have examples of each of the receiver_id and receiver_name values, once these are provided, i should be able to provide some guidance

@jackVanish
Copy link
Contributor Author

In checking out the IMOS -> OTN pipeline I realized it's referring to the OTN->IMOS receiver/tag derivation functions. This needs to be fixed!

@jackVanish
Copy link
Contributor Author

@naomitress Here are the IMOS data test files included in Remora, so these are what I would've been using to build towards Surimi's OTN -> IMOS pipeline.

IMOS_animal_measurements.csv
IMOS_detections.csv
IMOS_receiver_deployment_metadata.csv
IMOS_transmitter_deployment_metadata.csv

@naomitress
Copy link

naomitress commented Mar 18, 2024

@naomitress Here are the IMOS data test files included in Remora, so these are what I would've been using to build towards Surimi's OTN -> IMOS pipeline.

IMOS_animal_measurements.csv IMOS_detections.csv IMOS_receiver_deployment_metadata.csv IMOS_transmitter_deployment_metadata.csv

IMOS_receiver_deployment_metadata.csv has receiver_name ie VR2W-109075

IMOS_detections.csv has receiver_name ie VR2W-113955 but it also has receiver_id ie 100577385

so, receiver_id should be ignored in favour or receiver_name as this is analogous to receiver

@jackVanish were receiver_id or receiver_name found in any other file types?

@jackVanish
Copy link
Contributor Author

When I went back to build out the imos -> OTN piece I found that the reason the main imos_otn column mapping function was underbuilt was because I had started building out two separate functions to map receiver metadata and tag metadata. I finished out those and then built a detections one based on the mapping files supplied in the now-closed Remora tickets. I think that'll get us most of the way through the IMOS -> OTN pipeline feedback, I've checked off the stuff that i know is solid. I will have some new test files to look at shortly.

@jackVanish
Copy link
Contributor Author

@naomitress Here are the IMOS data test files included in Remora, so these are what I would've been using to build towards Surimi's OTN -> IMOS pipeline.
IMOS_animal_measurements.csv IMOS_detections.csv IMOS_receiver_deployment_metadata.csv IMOS_transmitter_deployment_metadata.csv

IMOS_receiver_deployment_metadata.csv has receiver_name ie VR2W-109075

IMOS_detections.csv has receiver_name ie VR2W-113955 but it also has receiver_id ie 100577385

so, receiver_id should be ignored in favour or receiver_name as this is analogous to receiver

@jackVanish were receiver_id or receiver_name found in any other file types?

Also thanks for this Naomi, I will favour receiver-name!

@jackVanish
Copy link
Contributor Author

Also, in the IMOS test files above, we should be able to find the analogue to detectedby/project_code. I think we're handling this now with the coll_code parameter passed to otn_imos_column_map but we can always double-check.

@CaitlinBate
Copy link

re: generally, this is a very diffferent format than IMOS detections. do u remember why we picked this one? to match OTNs raw detection tables?

when looking at the otn_detections output file (made from the IMOS test data), is this format supposed to match schema.c_detections_YYYY table formats for OTN? there are very few columns included so i am wondering the reasoning behind this being our end-product (what are we going to use it for? it doesnt match otn detection extracts for example)

also, my point still stands that receiver (otn column) needs to be completed (recver_id is the IMOS column)

@jackVanish
Copy link
Contributor Author

Right, the IMOS->OTN surimi output was erroneous, so I'm not surprised it's weird and bonkers. We won't be using that, we're shooting for OTN detection-extract-like. Shannon and I are working on building that out. That should also cap off the receiver_id piece as well, since that'll be handled in building out the new functions. Can you speak to the project_code/detectedby bit in the first group?

The long version regarding the IMOS -> OTN piece. is that when I generated the IMOS->OTN output, I used a function that was a carbon copy of the OTN->IMOS one, because I had forgotten that I'd decided to break it up into three separate functions (one for tags, one for receivers, one for detections). So the output was basically garbage, and it's my fault for that. I spent some time last week building out the two functions (receivers and tags, IMOS -> OTN) that already existed, and working in a new one (IMOS -> OTN deteections) based on mappings given as part of building the code in Remora. The detections file is incomplete right now, which is what Shannon's helping out with, so the feedback here regarding columns in the IMOS -> OTN output will be addressed as part of building out those functions.

@CaitlinBate
Copy link

project_code/detectedby bit -- when moving from OTN extracts to IMOS extracts we need to make sure the project code (OTN's detectedby column) is put into the receiver_project_name column. i dont think we had example data before to look at

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants