-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data download restrictions are not properly respected in prod #2631
Comments
This issue came up whilst trying to test a new chart-based API for data downloads, where we would like to throw a clean exception for the non-redistributable case. |
@ikesau confirmed that Grapher respects it if the API indicates |
Turns out this was a false alarm due to a misunderstanding of the meaning of something being "private". In the ETL, private means that the general public cannot access those files, except when they are published as indicators in the In Grapher, datasets marked as This means |
UPDATE: this was a false alarm due to a misunderstanding of
isPrivate
on Grapherdatasets
modelContext
For some data sources, we are allowed to share charts of the data to the public, but not to allow the public to directly download the data from us. This is because the upstream data provider does not allow redistribution.
Whilst this might seem restrictive, the data provider is still doing a great public good in allowing us to visualise and write about the data for a broad audience, and we in turn are helping to bring their work to a broad audience.
Problem
It appears that data download restrictions are not honoured by the Grapher download overlay. For example:
After debugging, the issue comes the ETL not generating the right metadata into our data API.
Expected behaviour
Instead of the CSV download option, we should have a note saying that the data provider disallows redistribution, or better a link back to the original data provider so that you can get the data from them.
Technical notes
owid@automation-1
, filenamedata-api.db
select count(*) from metadata where json_extract(metadata, '$.nonRedistributable');
gives 307 non redistributable variablesselect count(*) from variables v inner join datasets d on (v.datasetId = d.id) where d.isPrivate;
on MySQL gives 101k variables that are meant to be non-redistributableNote on priority
If confirmed, this is high priority to fix, since it's essential that we respect the work of data providers and the restrictions they put on that.
The text was updated successfully, but these errors were encountered: