Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data download restrictions are not properly respected in prod #2631

Closed
Tracked by #2629
larsyencken opened this issue May 8, 2024 · 3 comments
Closed
Tracked by #2629

Data download restrictions are not properly respected in prod #2631

larsyencken opened this issue May 8, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@larsyencken
Copy link
Collaborator

larsyencken commented May 8, 2024

UPDATE: this was a false alarm due to a misunderstanding of isPrivate on Grapher datasets model

Context

For some data sources, we are allowed to share charts of the data to the public, but not to allow the public to directly download the data from us. This is because the upstream data provider does not allow redistribution.

Whilst this might seem restrictive, the data provider is still doing a great public good in allowing us to visualise and write about the data for a broad audience, and we in turn are helping to bring their work to a broad audience.

Problem

It appears that data download restrictions are not honoured by the Grapher download overlay. For example:

After debugging, the issue comes the ETL not generating the right metadata into our data API.

Expected behaviour

Instead of the CSV download option, we should have a note saying that the data provider disallows redistribution, or better a link back to the original data provider so that you can get the data from them.

Technical notes

  • Grapher gets the information from the data API
    • An API dump into SQLite is available at owid@automation-1, filename data-api.db
    • Querying select count(*) from metadata where json_extract(metadata, '$.nonRedistributable'); gives 307 non redistributable variables
    • However, select count(*) from variables v inner join datasets d on (v.datasetId = d.id) where d.isPrivate; on MySQL gives 101k variables that are meant to be non-redistributable
    • So, regardless of what Grapher is doing, the API is not generating this metadata correctly.

Note on priority

If confirmed, this is high priority to fix, since it's essential that we respect the work of data providers and the restrictions they put on that.

@larsyencken larsyencken added bug Something isn't working priority 1 - essential labels May 8, 2024
@larsyencken larsyencken changed the title Data download restrictions are not respected by Grapher Data download restrictions are not properly respected in prod May 8, 2024
@larsyencken
Copy link
Collaborator Author

This issue came up whilst trying to test a new chart-based API for data downloads, where we would like to throw a clean exception for the non-redistributable case.

@larsyencken larsyencken self-assigned this May 9, 2024
@larsyencken larsyencken transferred this issue from owid/owid-grapher May 9, 2024
@larsyencken
Copy link
Collaborator Author

@ikesau confirmed that Grapher respects it if the API indicates nonRedistributable = true, making this an API/ETL issue.

@larsyencken
Copy link
Collaborator Author

Turns out this was a false alarm due to a misunderstanding of the meaning of something being "private".

In the ETL, private means that the general public cannot access those files, except when they are published as indicators in the grapher:// step. At that stage, anything private should be marked as nonRedistributable in the metadata.

In Grapher, datasets marked as !isPrivate && !nonRedistributable are automatically re-published to Github. If something is !nonRedistributable, it means CSV download is available with Grapher.

This means !isPrivate should probably be renamed publishToGithub, and it should be false any time nonDistributable is true.

@larsyencken larsyencken closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant