Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Species names #1

Open
amael-ls opened this issue Sep 12, 2016 · 10 comments
Open

Species names #1

amael-ls opened this issue Sep 12, 2016 · 10 comments
Assignees
Labels

Comments

@amael-ls
Copy link

In file final_ref_table.csv, 2 names for Pinus banksiana (cf lines 1150 and 2276):

  • 183319-PIN-BAN
  • NA-PIN-BAN
@ltalluto
Copy link

I ran into this with pin ban and a few other spp. Looks like some records just didn't get TSNs associated with them. It's probably safe to just update the NAs to the correct species, no?

@MiraBryant
Copy link

I agree, updating is the best idea.

Miranda

Sent from my iPad

On Sep 12, 2016, at 1:30 PM, Matthew Talluto [email protected] wrote:

I ran into this with pin ban and a few other spp. Looks like some records just didn't get TSNs associated with them. It's probably safe to just update the NAs to the correct species, no?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@amael-ls
Copy link
Author

The potential problematic species:
"NA-ACA-ANE"
"NA-CAR-CAR"
"NA-CAR-OVA"
"NA-CHA-NA"
"NA-CYA-NA"
"NA-EUG-PAL"
"NA-EUG-STE"
"NA-HED-NA"
"NA-LIQ-STY"
"NA-MAL-NA"
"NA-MOR-NA"
"NA-PAR-NA"
"NA-PIN-BAN"
"NA-PLA-NA"
"NA-PRI-LAN"
"NA-PSY-MAR"
"NA-PTE-MAC"
"NA-QUE-MAR"
"NA-QUE-PRI"

Here is the small function I used to detect them (it is a quick and dirty solution, sorry
listProblem.R.zip

)

@amael-ls
Copy link
Author

amael-ls commented Sep 13, 2016

Ok, after comparing latin names, I found that only two species are the same:

  1. PIN-BAN
  2. LIQ-STY

Therefore they can be merged

@ltalluto
Copy link

I think there are a few issues going on here. For Pin ban and Liq Sty, there are TSNs for some records and not for others, so the NA records need to be updated to point to the right species key. For others, TSNs (and in some cases, specific epithets) are missing entirely.

For the missing epithets (records ending in -NA), we should verify from the raw data if possible that these records were only genus level observations.

For others, we should add TSNs when they are available. If the species is not listed in ITIS, we should check for synonyms and use the TSN for the synonym.

@amael-ls
Copy link
Author

amael-ls commented Sep 14, 2016

nbColumns.c++.zip

Another issue (which might not be one...): There are some semicolon in the english name of some species. Therefore read.table (and friends) from R cannot read them because the separator is also semicolon. Here is a C++ function that detect where there are some problems. On the file "final_ref_table.csv", I found 78 problems (run the function to have the lines). Example line 11:
18032;"Abies";"balsamea";"Balsam fir ;balsam fir";"Sapin baumier";"SAB";20;"Bf";12;5;"18032-ABI-BAL"

@ltalluto
Copy link

read.table handles this fine on my machine. The quotes protect the extra semicolon. Depending on your version/localization of R, you may have to set sep=";", quote='"'

@SteveViss
Copy link
Member

SteveViss commented Sep 14, 2016

For the missing epithets (records ending in -NA), we should verify from the raw data if possible that these records were only genus level observations.
From @mtalluto

Yes, you're right. This is the decision we took. Those species have only a genus.

As you suggested, I have to update the first NA value in species code string for the right TSN (when it's possible). We still have too keep in mind than on ~2500 total species in the ref_speciestable only ~200 species are present in the QUICC-FOR database.

@amael-ls
Copy link
Author

amael-ls commented Sep 19, 2016

It seems that some species have synonyms, maybe this is why you could not find TSN code. Example:
NA-CAR-ALB; Carya alba; Carya tomentosa
NA-CHA-NOO; Chamaecyparis nootkatensis; Cupressus nootkatensis (changed in 1993)
NA-QUE-PRI; Quercus prinus L.; Quercus montana
NA-TAX-ASC; Taxodium ascendens; Taxodium distichum var. imbricarium (or var. nutans??)

cf ITIS website:
http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=183433

@ltalluto
Copy link

Yes, this is exactly it. I don't have access to the database from here (I think?), so I can't make the change. You'll have to buy Steve a beer and he can do it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants