Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PacFIN BDS data have duplicated records #1

Open
kellijohnson-NOAA opened this issue Jun 14, 2023 · 2 comments
Open

PacFIN BDS data have duplicated records #1

kellijohnson-NOAA opened this issue Jun 14, 2023 · 2 comments

Comments

@kellijohnson-NOAA
Copy link
Contributor

PacFIN.Utilities::PullBDS.PacFIN("SABL") leads to

Warning...
Warning: The downloaded data contains duplicated entries that will be
removed prior to returning the data. Please notify the agency that
provided the following duplicated samples:
# A tibble: 108 × 4
# Groups:   AGENCY_CODE, SAMPLE_YEAR, SAMPLE_NUMBER [108]
    AGENCY_CODE SAMPLE_YEAR SAMPLE_NUMBER     n
    <chr>             <int> <chr>         <int>
  1 C                  2007 2007550320063    26
  2 C                  2021 2021220270163    23
  3 C                  2021 2021220270185    22
  4 C                  2021 2021220270186    16
  5 C                  2021 2021220370227    14
  6 C                  2021 2021220370228    16
  7 C                  2021 2021220370239    24
  8 C                  2021 2021220370257    14
  9 C                  2021 2021220370268    18
 10 C                  2021 2021220370284    13
 11 C                  2021 2021220370285    17
 12 C                  2021 2021220370310    14
 13 C                  2021 2021220470323    14
 14 C                  2021 2021440240088    30
 15 C                  2021 2021440240099    20
 16 C                  2021 2021440340132    32
 17 C                  2021 2021452240073    42
 18 C                  2021 2021452240078    19
 19 C                  2021 2021452240106    21
 20 C                  2021 2021550220049    21
 21 C                  2021 2021550220054    22
 22 C                  2021 2021550220056    23
 23 C                  2021 2021550320065    15
 24 C                  2021 2021550320076    24
 25 C                  2021 2021550320078    27
 26 C                  2021 2021550420095    32
 27 C                  2021 2021550420096     4
 28 C                  2021 2021592320068    22
 29 C                  2021 2021592320084    25
 30 C                  2021 2021592320085    23
 31 C                  2021 2021592420088    23
 32 C                  2021 202161142043     34
 33 C                  2021 202161142044     29
 34 C                  2021 202161142062     32
 35 C                  2022 2022220170040    25
 36 C                  2022 2022220170075    30
 37 C                  2022 2022220270098    19
 38 C                  2022 2022220270105    16
 39 C                  2022 2022220270112    19
 40 C                  2022 2022220270169    27
 41 C                  2022 2022220270210    15
 42 C                  2022 2022220370225    15
 43 C                  2022 2022220370238    15
 44 C                  2022 2022220370249    17
 45 C                  2022 2022220370252    26
 46 C                  2022 2022220370269    25
 47 C                  2022 2022220370272    16
 48 C                  2022 2022220370277    15
 49 C                  2022 2022220370280    11
 50 C                  2022 2022220370281    18
 51 C                  2022 2022220470288    12
 52 C                  2022 2022220470310    11
 53 C                  2022 2022220470313    22
 54 C                  2022 2022220470331    20
 55 C                  2022 2022223360029    22
 56 C                  2022 2022223460056    22
 57 C                  2022 2022400140006    38
 58 C                  2022 2022440140007    14
 59 C                  2022 2022440140008    24
 60 C                  2022 2022440140011     6
 61 C                  2022 2022440140012     9
 62 C                  2022 2022440240013    14
 63 C                  2022 2022440240016    17
 64 C                  2022 2022440240017    21
 65 C                  2022 2022440240018     9
 66 C                  2022 2022440240019     6
 67 C                  2022 2022452440036    68
 68 C                  2022 2022452440045    34
 69 C                  2022 2022550120005     3
 70 C                  2022 2022550120010    36
 71 C                  2022 2022550120020    13
 72 C                  2022 2022550120029    21
 73 C                  2022 2022550120035    23
 74 C                  2022 2022550120037    30
 75 C                  2022 2022550220041    22
 76 C                  2022 2022550220044    22
 77 C                  2022 2022550220045    31
 78 C                  2022 2022550220050    30
 79 C                  2022 2022550420071    22
 80 C                  2022 2022550420073     7
 81 C                  2022 2022550420075    23
 82 C                  2022 2022606110040    23
 83 C                  2022 2022606110041    23
 84 C                  2022 2022606210056    25
 85 C                  2022 2022606210057     2
 86 C                  2022 2022606210058     4
 87 C                  2022 2022606210059    24
 88 C                  2022 2022606210070    25
 89 C                  2022 2022606210071    14
 90 C                  2022 2022606210072    15
 91 C                  2022 2022606410223    52
 92 C                  2022 2022606410224    28
 93 C                  2022 2022606410225    19
 94 C                  2022 2022606410226    11
 95 C                  2022 2022606410229     1
 96 C                  2022 2022606410230     3
 97 C                  2022 202261112009     29
 98 C                  2022 202261112022     26
 99 C                  2022 202261112023     20
100 C                  2022 202261122077     35
101 C                  2022 202270011002     16
102 C                  2022 202270011003     15
103 C                  2022 202270021008     34
104 C                  2022 202270021010     33
105 C                  2022 202270021013     15
106 C                  2022 202270031024     15
107 C                  2022 202270031028     23
108 C                  2022 202274821020     16
@John-R-Wallace-NOAA
Copy link

John-R-Wallace-NOAA commented Jun 15, 2023

This appears to be a PacFIN issue, at least for migrating the BDS legacy data that was originally from the states. Unless PacFIN had the states reload all their data into the new comprehensive BDS table.

The legacy data in table 'pacfin.bds_sample' does not have the dups:
   SPID     SAMPLE_NO SAMPLE_YEAR SOURCE_AGID CLUSTER_NO FISH_NO FISH_LENGTH
1  SABL 2007550320063        2007           C          1       1         515
2  SABL 2007550320063        2007           C          2       1         584
3  SABL 2007550320063        2007           C          1       2         557
4  SABL 2007550320063        2007           C          2       2         556
5  SABL 2007550320063        2007           C          1       3         473
6  SABL 2007550320063        2007           C          2       3         682
7  SABL 2007550320063        2007           C          1       4         540
8  SABL 2007550320063        2007           C          2       4         557
9  SABL 2007550320063        2007           C          1       5         558
10 SABL 2007550320063        2007           C          2       5         624
11 SABL 2007550320063        2007           C          1       6         480
12 SABL 2007550320063        2007           C          2       6         493
13 SABL 2007550320063        2007           C          1       7         485
14 SABL 2007550320063        2007           C          2       7         628
15 SABL 2007550320063        2007           C          1       8         566
16 SABL 2007550320063        2007           C          2       8         560
17 SABL 2007550320063        2007           C          1       9         550
18 SABL 2007550320063        2007           C          2       9         512
19 SABL 2007550320063        2007           C          1      10         515
20 SABL 2007550320063        2007           C          2      10         520
21 SABL 2007550320063        2007           C          1      11         619
22 SABL 2007550320063        2007           C          2      11         602
23 SABL 2007550320063        2007           C          1      12         535
24 SABL 2007550320063        2007           C          1      13         523
25 SABL 2007550320063        2007           C          1      14         475
26 SABL 2007550320063        2007           C          1      15         650

but the COMPREHENSIVE_BDS_COMM does:

   PACFIN_SPECIES_CODE SAMPLE_NUMBER SAMPLE_YEAR AGENCY_CODE CLUSTER_ID CLUSTER_SEQUENCE_NUMBER    FISH_ID FISH_SEQUENCE_NUMBER AGE_SEQUENCE_NUMBER FISH_LENGTH
1                 SABL 2007550320063        2007           C   97242947                       1 1889912963                    1                  NA         515
2                 SABL 2007550320063        2007           C   97242947                       1 1889912963                    1                  NA         515
3                 SABL 2007550320063        2007           C   97215262                       2 1890925570                    1                  NA         584
4                 SABL 2007550320063        2007           C   97215262                       2 1890925570                    1                  NA         584
5                 SABL 2007550320063        2007           C   97242947                       1 1890439527                    2                  NA         557
6                 SABL 2007550320063        2007           C   97242947                       1 1890439527                    2                  NA         557
7                 SABL 2007550320063        2007           C   97215262                       2 1889699473                    2                  NA         556
8                 SABL 2007550320063        2007           C   97215262                       2 1889699473                    2                  NA         556
9                 SABL 2007550320063        2007           C   97242947                       1 1889738264                    3                  NA         473
10                SABL 2007550320063        2007           C   97242947                       1 1889738264                    3                  NA         473
11                SABL 2007550320063        2007           C   97215262                       2 1890575634                    3                  NA         682
12                SABL 2007550320063        2007           C   97215262                       2 1890575634                    3                  NA         682
13                SABL 2007550320063        2007           C   97215262                       2 1890751118                    4                  NA         557
14                SABL 2007550320063        2007           C   97215262                       2 1890751118                    4                  NA         557
15                SABL 2007550320063        2007           C   97242947                       1 1890264527                    4                  NA         540
16                SABL 2007550320063        2007           C   97242947                       1 1890264527                    4                  NA         540
17                SABL 2007550320063        2007           C   97215262                       2 1889699474                    5                  NA         624
18                SABL 2007550320063        2007           C   97215262                       2 1889699474                    5                  NA         624
19                SABL 2007550320063        2007           C   97242947                       1 1890264528                    5                  NA         558
20                SABL 2007550320063        2007           C   97242947                       1 1890264528                    5                  NA         558
21                SABL 2007550320063        2007           C   97215262                       2 1889699475                    6                  NA         493
22                SABL 2007550320063        2007           C   97215262                       2 1889699475                    6                  NA         493
23                SABL 2007550320063        2007           C   97242947                       1 1889738265                    6                  NA         480
24                SABL 2007550320063        2007           C   97242947                       1 1889738265                    6                  NA         480
25                SABL 2007550320063        2007           C   97242947                       1 1890439528                    7                  NA         485
26                SABL 2007550320063        2007           C   97242947                       1 1890439528                    7                  NA         485
27                SABL 2007550320063        2007           C   97215262                       2 1890925571                    7                  NA         628
28                SABL 2007550320063        2007           C   97215262                       2 1890925571                    7                  NA         628
29                SABL 2007550320063        2007           C   97215262                       2 1890049399                    8                  NA         560
30                SABL 2007550320063        2007           C   97215262                       2 1890049399                    8                  NA         560
31                SABL 2007550320063        2007           C   97242947                       1 1889738266                    8                  NA         566
32                SABL 2007550320063        2007           C   97242947                       1 1889738266                    8                  NA         566
33                SABL 2007550320063        2007           C   97242947                       1 1889738267                    9                  NA         550
34                SABL 2007550320063        2007           C   97242947                       1 1889738267                    9                  NA         550
35                SABL 2007550320063        2007           C   97215262                       2 1889874638                    9                  NA         512
36                SABL 2007550320063        2007           C   97215262                       2 1889874638                    9                  NA         512
37                SABL 2007550320063        2007           C   97215262                       2 1890049398                   10                  NA         520
38                SABL 2007550320063        2007           C   97215262                       2 1890049398                   10                  NA         520
39                SABL 2007550320063        2007           C   97242947                       1 1889738262                   10                  NA         515
40                SABL 2007550320063        2007           C   97242947                       1 1889738262                   10                  NA         515
41                SABL 2007550320063        2007           C   97215262                       2 1889874637                   11                  NA         602
42                SABL 2007550320063        2007           C   97215262                       2 1889874637                   11                  NA         602
43                SABL 2007550320063        2007           C   97242947                       1 1890789631                   11                  NA         619
44                SABL 2007550320063        2007           C   97242947                       1 1890789631                   11                  NA         619
45                SABL 2007550320063        2007           C   97242947                       1 1889738263                   12                  NA         535
46                SABL 2007550320063        2007           C   97242947                       1 1889738263                   12                  NA         535
47                SABL 2007550320063        2007           C   97242947                       1 1890789632                   13                  NA         523
48                SABL 2007550320063        2007           C   97242947                       1 1890789632                   13                  NA         523
49                SABL 2007550320063        2007           C   97242947                       1 1890439526                   14                  NA         475
50                SABL 2007550320063        2007           C   97242947                       1 1890439526                   14                  NA         475
51                SABL 2007550320063        2007           C   97242947                       1 1890264526                   15                  NA         650
52                SABL 2007550320063        2007           C   97242947                       1 1890264526                   15                  NA         650

@kellijohnson-NOAA
Copy link
Contributor Author

@John-R-Wallace-NOAA PacFIN and Brenda have already been notified.

@kellijohnson-NOAA kellijohnson-NOAA added this to the 2024_pre_assessment milestone Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants