-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Species names #1
Comments
I ran into this with pin ban and a few other spp. Looks like some records just didn't get TSNs associated with them. It's probably safe to just update the NAs to the correct species, no? |
I agree, updating is the best idea. Miranda Sent from my iPad
|
The potential problematic species: Here is the small function I used to detect them (it is a quick and dirty solution, sorry ) |
Ok, after comparing latin names, I found that only two species are the same:
Therefore they can be merged |
I think there are a few issues going on here. For Pin ban and Liq Sty, there are TSNs for some records and not for others, so the NA records need to be updated to point to the right species key. For others, TSNs (and in some cases, specific epithets) are missing entirely. For the missing epithets (records ending in -NA), we should verify from the raw data if possible that these records were only genus level observations. For others, we should add TSNs when they are available. If the species is not listed in ITIS, we should check for synonyms and use the TSN for the synonym. |
Another issue (which might not be one...): There are some semicolon in the english name of some species. Therefore read.table (and friends) from R cannot read them because the separator is also semicolon. Here is a C++ function that detect where there are some problems. On the file "final_ref_table.csv", I found 78 problems (run the function to have the lines). Example line 11: |
read.table handles this fine on my machine. The quotes protect the extra semicolon. Depending on your version/localization of R, you may have to set |
Yes, you're right. This is the decision we took. Those species have only a genus. As you suggested, I have to update the first NA value in species code string for the right TSN (when it's possible). We still have too keep in mind than on ~2500 total species in the |
It seems that some species have synonyms, maybe this is why you could not find TSN code. Example: cf ITIS website: |
Yes, this is exactly it. I don't have access to the database from here (I think?), so I can't make the change. You'll have to buy Steve a beer and he can do it :) |
In file final_ref_table.csv, 2 names for Pinus banksiana (cf lines 1150 and 2276):
The text was updated successfully, but these errors were encountered: