Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biomass and rarity values may be affected by the rounding/double precision issue #1358

Open
atcooper1 opened this issue May 1, 2024 · 7 comments

Comments

@atcooper1
Copy link
Collaborator

atcooper1 commented May 1, 2024

a and b values, plus trait values may be affected by the same rounding/double precision issue that is currently affecting lat/long.
Eg. Zoramia leptacanthus
Some a and b values also don't seem to be reflecting what is on Fishbase - might be worth checking the FB file that was ingested in 2022?

@atcooper1 atcooper1 changed the title Biomass and rarity values may be affected by the rounding/double integer issue Biomass and rarity values may be affected by the rounding/double precision issue May 1, 2024
@utas-raymondng
Copy link
Contributor

@bpasquer
Copy link
Contributor

bpasquer commented Jun 4, 2024

Biomass coefficients:
The A's and b's value displayed in the database are the result of the handling of numbers as double in the script to generate to SQL code for the update.
So again, because the values were handled as double in the script you see what you see is not exactly what you expected:

script expected
a 0.00954992976039648 0.00955
b 3.049999952316284 3.05

Would you like to see the values of 'a' and 'b' being rounded?
You might also recall that the values we ingested in 2022 were not the latest available on the Fishbase pages, as the most recent version was not publicly accessible on the website.

Rarity
unfortunately, since rarity statistics are computed metrics, it is difficult to determine if they have been affected by the rounding issue, as we have no reference for comparison.

@atcooper1
Copy link
Collaborator Author

I think rounding would be helpful, especially when copying a's and b's for superseded species.

@bpasquer
Copy link
Contributor

bpasquer commented Jun 4, 2024

A's rounded to 5 decimals? is it consistently the case though?
B's rounded to 2 decimals?

@atcooper1
Copy link
Collaborator Author

Yes, a's to 5 decimals, b's to 2.
Thanks, Bene

@bpasquer
Copy link
Contributor

bpasquer commented Jun 6, 2024

From conversation 06/06/2024:

Bene : After examining the Rfishbase package more closely, it appears that an updated version of the database from May 2023 is available.
If this update is indeed available( i need to look at the data), and considering that I've planned to re-ingest rounded biomass coefficients in the DB, I assume you would prefer the latest version to be ingested.
Toni: Yes, that would be great if possible please?

Decision: update biomass coefficent to the latest Fishbase release and apply the rounding as agreed

@bpasquer
Copy link
Contributor

SQL update was applied (ref #1374)
From testing , Toni identified discrepancies between updated values and Fishbase website.
Values in the update were from the fb_parquet_2023-05 release in this repo https://github.com/cboettig/rfishbase_board/, the same repo as last update. This repos was thought to be the source of the Fishbase dataset.
However, after a web research another data source for Fishbase was found with a more recent release(release24.07) and value in agreement with FB website here.
And more specifically : https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb/v24.07/parquet

The update script will be re-generate.

@bpasquer bpasquer added this to the Maintenance - Data milestone Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants