Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-unique sample identifiers #3

Open
cnrdh opened this issue Jun 4, 2018 · 9 comments
Open

Non-unique sample identifiers #3

cnrdh opened this issue Jun 4, 2018 · 9 comments
Assignees

Comments

@cnrdh
Copy link
Member

cnrdh commented Jun 4, 2018

About 130 non-unique sample names (2014-2017)
See also issues #4, #5 (2014,2016,2017) and #6 (2015)

@cnrdh
Copy link
Member Author

cnrdh commented Jun 4, 2018

cat data/master/sample/*2017*.ndjson | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn
     37 "Sal_Temp_pH"
      8 "undefined"
      4 "KpN4_R1_10M; KpN4_R2_10M"
      2 "R6_R1_S; R6_R2_S"
      2 "R6_R1_M; R6_R2_M"
      2 "R6_R1_B; R6_R2_B"
      2 "MOSJ2017/DOX-009"
      2 "MOSJ2017/DOX-008"
      2 "MOSJ2017/DOX-007"
      2 "MOSJ2017/DIC-091"
      2 "GlacierFront2017/SAL-386"
      2 "GlacierFront2017/PHT-033"
      2 "GlacierFront2017/OXY-161"
      2 "GlacierFront2017/OXY-160"
      2 "GlacierFront2017/OXY-021"
      2 "GlacierFront2017/DIC-079"
      2 "GlacierFront2017/DIC-078"
      2 "GlacierFront2017/DIC-077"
      2 "GlacierFront2017/DIC-076"
      2 "GlacierFront2017/DIC-033"
      2 "GlacierFront2017/DIC-031"
cat data/master/sample/*2016*.ndjson | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn
      6 "undefined"
      2 "MOSJ2016/ZOT-063"
      2 "MOSJ2016/ZOT-062"
      2 "MOSJ2016/ZOT-061"
      2 "MOSJ2016/ZOT-018"
      2 "MOSJ2016/ZOT-017"
      2 "MOSJ2016/ZOT-016"
      2 "MOSJ2016/ZOT-015"
      2 "MOSJ2016/ZOT-014"
      2 "MOSJ2016/PAB-070"
      2 "MOSJ2016/MAA-045"
      2 "MOSJ2016/MIT-015"
      2 "MOSJ2016/CDO-054"
      2 "GlacierFront2016/NUT-243"
      2 "GlacierFront2016/NUT-242"

@cnrdh
Copy link
Member Author

cnrdh commented Jun 4, 2018

$ cat data/master/sample/2014.ndjson | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn
5 "undefined"
2 "MOSJ2014/FCM-068"
2 "MOSJ2014/FCM-067"
2 "MOSJ2014/FCM-066"
2 "MOSJ2014/FCM-065"
2 "MOSJ2014/FCM-064"
2 "MOSJ2014/FCM-063"
2 "MOSJ2014/FCM-062"
2 "ICE2014/ZOT-076"

@cnrdh
Copy link
Member Author

cnrdh commented Jun 4, 2018

$ cat data/master/sample/*2015*.ndjson | ndjson-filter '!d.sample.match(/\/(DIC|BAR|GAS|FCM|SAL|OX[IY])/)' | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn 
      2 "On-ice CTD-047"
      2 "N-ICE2015/SWN-068"
      2 "N-ICE2015/POC-530"
      2 "N-ICE2015/NUT-915"
      2 "N-ICE2015/NUT-914"
      2 "N-ICE2015/NUT-913"
      2 "N-ICE2015/NUT-912"
      2 "N-ICE2015/NUT-699"
      2 "N-ICE2015/NUT-698"
      2 "N-ICE2015/NUT-697"
      2 "N-ICE2015/NUT-670"
      2 "N-ICE2015/NUT-289"
      2 "N-ICE2015/NUT-288"
      2 "N-ICE2015/NUT-286"
      2 "N-ICE2015/IAT-255"
      2 "N-ICE2015/IAT-230"
      2 "N-ICE2015/IAT-229"
      2 "N-ICE2015/FCF-109"
      2 "N-ICE2015/DOX-232"
      2 "N-ICE2015/DOC-200"
      2 "N-ICE2015/DOC-004"
      2 "N-ICE2015/CHL-872"
      2 "N-ICE2015/CHL-414"
      2 "N-ICE2015/CHL-173"
      2 "N-ICE2015/CHL-105"
      2 "N-ICE2015/BSI-058"
      2 "MOSJ2015/CHL-43"
      2 "MOSJ2015/CHL-42"
      2 "MOSJ2015/CHL-41"
      2 "MOSJ2015/CHL-40"
      2 "MOSJ2015/CHL-39"

@cnrdh
Copy link
Member Author

cnrdh commented Jun 5, 2018

Only 38 left now
19 "undefined"
2 "R6_R1_S; R6_R2_S"
2 "R6_R1_M; R6_R2_M"
2 "R6_R1_B; R6_R2_B"
2 "N-ICE2015/OXY-024"
2 "N-ICE2015/IAT-230"
2 "N-ICE2015/IAT-229"
2 "N-ICE2015/FCM-534"
2 "N-ICE2015/FCM-437"
2 "N-ICE2015/FCF-109"
2 "N-ICE2015/DOX-232"
2 "N-ICE2015/DIC-632"
2 "MOSJ2017/DIC-091"
2 "MOSJ2016/ZOT-063"
2 "MOSJ2016/ZOT-062"
2 "MOSJ2016/ZOT-061"
2 "MOSJ2016/ZOT-018"
2 "MOSJ2016/ZOT-017"
2 "MOSJ2016/ZOT-016"
2 "MOSJ2016/ZOT-015"
2 "MOSJ2016/ZOT-014"
2 "MOSJ2016/PAB-070"
2 "MOSJ2016/MAA-045"
2 "MOSJ2016/MIT-015"
2 "MOSJ2016/CDO-054"
2 "MOSJ2014/FCM-068"
2 "MOSJ2014/FCM-067"
2 "MOSJ2014/FCM-066"
2 "MOSJ2014/FCM-065"
2 "MOSJ2014/FCM-064"
2 "MOSJ2014/FCM-063"
2 "MOSJ2014/FCM-062"
2 "ICE2014/ZOT-076"
2 "GlacierFront2016/NUT-243"
2 "GlacierFront2016/NUT-242"
2 "GAS-502""
2 "GAS-501""
2 "DOC-200""

@cnrdh
Copy link
Member Author

cnrdh commented Jun 8, 2018

Also these from 2001:
2 "01 V15 WP3 8"
2 "01M V10 WP3"
2 "01M Kb52 WP3"
2 "01M Kb28 WP3"
2 "01 Kb52 WP3 1"
2 "01 Kb28 WP3 1"

@cnrdh
Copy link
Member Author

cnrdh commented Jun 12, 2018

Deleted with expedition 01M, kept identical except expedition OAERRE-2001

2001-05-22T14:00:00Z	10.9	79.0305	V15	Lance	01M	"WP3 1000 µm"	"01 V15 WP3 8|01M V10 WP3"	300-0|100-0	taxonomy|lipids	mesozooplankton|	319
2001-05-22T01:05:00Z	12.181667	78.913333	Kb28	Lance	01M	"WP3 1000 µm"	"01 Kb28 WP3 1|01M Kb28 WP3"	90-0|80-0	taxonomy|lipids	mesozooplankton|	101	
2001-05-21T12:45:00Z	11.421667	79.041667	Kb52	Lance	01M	"WP3 1000 µm"	"01 Kb52 WP3 1|01M Kb52 WP3"	200-0|200-0	taxonomy|lipids	mesozooplankton|	240

@cnrdh cnrdh self-assigned this Jun 13, 2018
@cnrdh
Copy link
Member Author

cnrdh commented Jun 13, 2018

As expected, unwinding samples led to new duplicates, but these were removed again by the fix for #16.
Status now for 27384 samples:

$ cat data/master/sample/*.ndjson | ndjson-map 'd.sample' | sort | uniq -cd | sort -rn
      2 "01_V15_WP3_8"
      2 "01M_V10_WP3"
      2 "01M_Kb52_WP3"
      2 "01M_Kb28_WP3"
      2 "01_Kb52_WP3_1"
      2 "01_Kb28_WP3_1"

@cnrdh cnrdh closed this as completed Jun 13, 2018
@cnrdh cnrdh reopened this Jun 13, 2018
@cnrdh
Copy link
Member Author

cnrdh commented Jun 13, 2018

Why arent't all 2014-2017 samples prefixed? Because they break the expected XXX-NNN pattern...
See #16

@cnrdh
Copy link
Member Author

cnrdh commented Oct 25, 2018

MOSJ2017:

"R6_R1_B"
"R6_R1_M"
"R6_R1_S"
"R6_R2_B"
"R6_R2_M"
"R6_R2_S"
"V12_R1_25m"
"V12_R1_B"
"V12_R1_M"
"V12_R1_S"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant