Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert IOPAN protist data from ICE2010 into Darwin Core #51

Open
cnrdh opened this issue Oct 15, 2021 · 5 comments
Open

Convert IOPAN protist data from ICE2010 into Darwin Core #51

cnrdh opened this issue Oct 15, 2021 · 5 comments
Assignees

Comments

@cnrdh
Copy link
Member

cnrdh commented Oct 15, 2021

$ wc -l data/deposit/iopan/protist-biodiversity/*ICE10*
  3444 data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv
   153 data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10-handnet.csv
  3597 total
@cnrdh
Copy link
Member Author

cnrdh commented Oct 18, 2021

Transforms need a little helping hand, like below, or by adding ["bottle_no","fieldNumber"] to iopanDwcOccurrenceTuples.

~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv \
  | ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number \
  | ndjson-map 'd.fieldNumber = d.bottle_no,delete d.bottle_no,delete d.Class_phylum, d' \
  | ./bin/ndjson-transform --tsv | ./bin/dwc-occurrence-csv-transform

log_no ?

   3348 ""
     14 159
     32 162
     16 163
     13 164
     20 439

size classes (µm):

    140 ""
      1 "10-20"
      1 20
      1 "20-30"
      1 "30-40"
      3 40
      1 60
      2 70
      1 "70;160"
      1 80

@cnrdh
Copy link
Member Author

cnrdh commented Oct 18, 2021

Regular: 138 uniq fieldNumbers, 12 not found

ndjson-join --left d.fieldNumber <( cat data/deposit/iopan/protist-biodiversity//konghau_database_completeICE10.csv| ./bin/dwc-occurrence-csv-transform ) <( cat data/deposit/2010/ICE2010/ice_2010_sampling-events.tsv | ./bin/dwc-sampling-event-csv-transform ) | ndjson-filter 'd[1]===null' | ndjson-map 'd=d[0],[d.expedition,d.locationID,d.maximumDepthInMeters,d.minimumDepthInMeters,d.fieldNumber]' | sort | uniq -c
     27 ["ICE2010","ICE10-16",0,null,"ICE10-379"]
     15 ["ICE2010","ICE10-16",100,100,"ICE10-384"]
     38 ["ICE2010","ICE10-16",10,10,"ICE10-381"]
     46 ["ICE2010","ICE10-16",35,35,"ICE10-382"]
     36 ["ICE2010","ICE10-16",50,50,"ICE10-380"]
     21 ["ICE2010","ICE10-16",50,50,"ICE10-383"]
     34 ["ICE2010","R4",0,null,"ICE10-152"]
     13 ["ICE2010","R4",100,100,"ICE10-157"]
     32 ["ICE2010","R4",25,25,"ICE10-155"]
     13 ["ICE2010","R4",38,38,"ICE10-158"]
     16 ["ICE2010","R4",50,50,"ICE10-156"]
     18 ["ICE2010","R6b",5,5,"ICE10-253"]

Investigate: Why incalculable?
=> missing bottle volume in input, but there is a "cells in 250 ml" column, only used for 614 microplankton (32L initial volume, filtered)

cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.csv  | ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number | ndjson-map '[d.Vth_filtered_L,d.cells_in_250_ml]' | ndjson-filter 'd[0]!=1 && d[1]!=0' | ndjson-map d[0] | sort | uniq -c

    614 32

[Only 570 has 32L in "total database", with 595 marked as micro :/]

@cnrdh
Copy link
Member Author

cnrdh commented Oct 18, 2021

Handnet, fails JSON schema validation for

"scientificName":"cysta chrysophyta"
"scientificName":"cysta chaetoceros (simplex)"
{"name":"ICE10","depth":"20-0","station":"R7b","no":"321","data":"21.08.2010","Class/phylum":"Chrysophyceae","takson":"cysta chrysophyta","size class (μm)":""}
{"name":"ICE10","depth":"20-0","station":"R7b","no":"321","data":"21.08.2010","Class/phylum":"Diatomeae","takson":"cysta chaetoceros (simplex)","size class (μm)":""}
{"name":"ICE10","depth":"20-0","station":"R9b","no":"475","data":"22.08.2010","Class/phylum":"Chrysophyceae","takson":"cysta chrysophyta","size class (μm)":""}

@cnrdh cnrdh self-assigned this Oct 27, 2021
@cnrdh
Copy link
Member Author

cnrdh commented Oct 28, 2021

Samplelog contains
112 phytoplankton
18 microplankton

Data has no microplankton marker, except use of Vth_filtered_L:
42 uniq bottle_no has filtered volume > 1L (microplankton?)
96 has Vth_filtered_L === 1

~/npolar/marine-db$ cat data/deposit/iopan/protist-biodiversity/konghau_database_completeICE10.tsv   | ./bin/csv-transform --ndjson --transformers=csv/ascii-header,csv/number | ndjson-filter 'd.Vth_filtered_L>1' | ndjson-map d.bottle_no | sort | uniq -c | wc -l
42

@cnrdh
Copy link
Member Author

cnrdh commented Oct 28, 2021

About non-match, there's ABC samples,

$ cat $samples | grep 2010 | grep -E 'ICE10-15[56789][A]' | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d' | sort | uniq 
$ cat $samples | grep 2010 | grep -E "ICE10-38[234][A]" | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d'
cat $samples | grep 2010 | grep -E "ICE10-(379|38[01])[A]" | ndjson-map '["eventID", "materialSampleID","recordedBy", "gear", "samplingProtocol", "sampletype"].map(k=>delete d[k]),d.fieldNumber=d.fieldNumber.replace(/[ABC]$/,""),d'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant