Skip to content

Commit

Permalink
fix #2976
Browse files Browse the repository at this point in the history
  • Loading branch information
lucasrodes committed Nov 19, 2024
1 parent ced15fe commit c0c7ec9
Showing 1 changed file with 11 additions and 6 deletions.
17 changes: 11 additions & 6 deletions docs/guides/private-import.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ tags:
- 👷 Staff
---

While most of the data at OWID is publicly available, some datasets are added to our catalogue with some restrictions. These include datasets that are not redistributable, or that are not meant to be shared with the public. This can happen due to a strict license by the data provider, or because the data is still in a draft stage and not ready for public consumption.
While most of the data at OWID is publicly available, some datasets are added to our catalog with some restrictions. These include datasets that are not redistributable, or that are not meant to be shared with the public. This can happen due to a strict license by the data provider, or because the data is still in a draft stage and not ready for public consumption.

Various privacy configurations are available:

- Skip re-publishing to GitHub.
- Disable data downloading options on Grapher.
- Disable public access to the original file (snapshot).
- Hide the dataset from our public catalog (accessible via `owid-catalog-py`).
Expand All @@ -16,6 +15,12 @@ In the following, we explain how to create private steps in the ETL pipeline and

## Create a private step


!!! tip "Make your dataset completely private"

- **Snapshot**: Set `meta.is_public` to `false` in the snapshot DVC file.
- **Meadow, Garden, Grapher**: Use `data-private://` prefix in the step name in the DAG. Set `dataset.non_redistributable` to `true` in the dataset garden metadata.

### Snapshot

To create a private snapshot step, set the `meta.is_public` property in the snapshot .dvc file to false:
Expand All @@ -34,7 +39,7 @@ This will prevent the file to be publicly accessible without the appropriate cre

### Meadow, Garden, Grapher

Creating a private data step means that the data will not be listed in the public catalog, and therefore will not be accessible via `owid-catalog-py`. In addition, private datasets will not be re-published to GitHub.
Creating a private data step means that the data will not be listed in the public catalog, and therefore will not be accessible via `owid-catalog-py`.

To create a private data step (meadow, garden or grapher) simply use `data-private` prefix in the step name in the DAG. For example, the step `grapher/ihme_gbd/2024-06-10/leading_causes_deaths` (this is from [health.yml](https://github.com/owid/etl/blob/master/dag/health.yml)) is private:

Expand Down Expand Up @@ -70,8 +75,8 @@ etl run run [step-name] --private

If you want to make a private step public simply follow the steps below:

- **In the DAG:** Replace `data-private/` prefix with `data/`.
- **In the snapshot DVC file**: Set `meta.is_public` to `true` (or simply remove `is_public` property).
- (Optional) **Allow for Grapher downloads**: Set `dataset.non_redistributable` to `false` in the dataset garden metadata (or simply remove the property from the metadata).
- **In the DAG:** Replace `data-private://` prefix with `data://`.
- **In the snapshot DVC file**: Set `meta.is_public` to `true` (or simply remove this property).
- (Optional) **Allow for Grapher downloads**: Set `dataset.non_redistributable` to `false` in the dataset garden metadata (or simply remove this property).

After this, re-run the snapshot step and commit your changes.

0 comments on commit c0c7ec9

Please sign in to comment.