diff --git a/docs/guides/private-import.md b/docs/guides/private-import.md index 120ce5849e2..8da97007592 100644 --- a/docs/guides/private-import.md +++ b/docs/guides/private-import.md @@ -3,11 +3,10 @@ tags: - 👷 Staff --- -While most of the data at OWID is publicly available, some datasets are added to our catalogue with some restrictions. These include datasets that are not redistributable, or that are not meant to be shared with the public. This can happen due to a strict license by the data provider, or because the data is still in a draft stage and not ready for public consumption. +While most of the data at OWID is publicly available, some datasets are added to our catalog with some restrictions. These include datasets that are not redistributable, or that are not meant to be shared with the public. This can happen due to a strict license by the data provider, or because the data is still in a draft stage and not ready for public consumption. Various privacy configurations are available: -- Skip re-publishing to GitHub. - Disable data downloading options on Grapher. - Disable public access to the original file (snapshot). - Hide the dataset from our public catalog (accessible via `owid-catalog-py`). @@ -16,6 +15,12 @@ In the following, we explain how to create private steps in the ETL pipeline and ## Create a private step + +!!! tip "Make your dataset completely private" + + - **Snapshot**: Set `meta.is_public` to `false` in the snapshot DVC file. + - **Meadow, Garden, Grapher**: Use `data-private://` prefix in the step name in the DAG. Set `dataset.non_redistributable` to `true` in the dataset garden metadata. + ### Snapshot To create a private snapshot step, set the `meta.is_public` property in the snapshot .dvc file to false: @@ -34,7 +39,7 @@ This will prevent the file to be publicly accessible without the appropriate cre ### Meadow, Garden, Grapher -Creating a private data step means that the data will not be listed in the public catalog, and therefore will not be accessible via `owid-catalog-py`. In addition, private datasets will not be re-published to GitHub. +Creating a private data step means that the data will not be listed in the public catalog, and therefore will not be accessible via `owid-catalog-py`. To create a private data step (meadow, garden or grapher) simply use `data-private` prefix in the step name in the DAG. For example, the step `grapher/ihme_gbd/2024-06-10/leading_causes_deaths` (this is from [health.yml](https://github.com/owid/etl/blob/master/dag/health.yml)) is private: @@ -70,8 +75,8 @@ etl run run [step-name] --private If you want to make a private step public simply follow the steps below: -- **In the DAG:** Replace `data-private/` prefix with `data/`. -- **In the snapshot DVC file**: Set `meta.is_public` to `true` (or simply remove `is_public` property). -- (Optional) **Allow for Grapher downloads**: Set `dataset.non_redistributable` to `false` in the dataset garden metadata (or simply remove the property from the metadata). +- **In the DAG:** Replace `data-private://` prefix with `data://`. +- **In the snapshot DVC file**: Set `meta.is_public` to `true` (or simply remove this property). +- (Optional) **Allow for Grapher downloads**: Set `dataset.non_redistributable` to `false` in the dataset garden metadata (or simply remove this property). After this, re-run the snapshot step and commit your changes.