owid · lucasrodes · Nov 19, 2024 · Nov 19, 2024 · Nov 19, 2024 · Nov 19, 2024
diff --git a/docs/architecture/workflow/index.md b/docs/architecture/workflow/index.md
diff --git a/docs/architecture/workflow/other-steps.md b/docs/architecture/workflow/other-steps.md
@@ -1,11 +1,35 @@
+---
+status: new
+---
+
 So far you have learned about the standard steps. These should cover most of the cases. However, there are some other steps worth mentioning.
 
-## Explorers
+## Export steps
+
+Sometimes we want to perform an action instead of creating a dataset. For instance, we might want to create a TSV file for an explorer, commit a CSV to a GitHub repository, or create a config for a multi-dimensional indicator. This is where the `Export` step comes in.
+
+Export steps are used to perform an action on an already created dataset. This action typically implies making the data available to other parts of the system. There are different types of export steps:
+
+- **Explorers**: Create a TSV file for a data explorer.
+- **Multi-dimensional indicators**: Create a configuration for a multi-dimensional indicator.
+- **Export to GitHub**: Commit a dataset to a GitHub repository.
+
+Export steps should be used after the data has been processed and is ready to be used (post-Garden).
+
+!!! note "Learn more about [export steps](../../guides/data-work/export-data.md)"
 
-Data explorers are Grapher charts expanded with additional functionalities to facilitate exploration, such as dynamic entity filters or customizable menus. They are powered by CSV files generated by ETL [served from S3](https://dash.cloudflare.com/078fcdfed9955087315dd86792e71a7e/r2/default/buckets/owid-catalog). Explorers data step in ETL is responsible for generating these CSV files. It works in the same way as e.g. garden step, but the transformations made there are meant to get the data ready for the data explorer (and not be consumed by users of catalog).
+### Explorers
+
+Data explorers are Grapher charts expanded with additional functionalities to facilitate exploration, such as dynamic entity filters or customizable menus. They are usually powered by indicators from OWID's Grapher database.
 
 !!! info "Learn more about creating Data explorers [on Notion :octicons-arrow-right-24:](https://www.notion.so/owid/Creating-Data-Explorers-cf47a5ef90f14c1fba8fc243aba79be7)."
 
+!!! note "Legacy explorers"
+
+    In the past Explorers were manually defined from our Admin. Data was sourced by CSV files generated by ETL [served from S3](https://dash.cloudflare.com/078fcdfed9955087315dd86792e71a7e/r2/default/buckets/owid-catalog), or on GitHub.
+
+    We have slowly transitioned into a new system where explorers are generated from the ETL pipeline. This is a more scalable and maintainable solution.
+
 ## Backport
 
 Datasets from our production grapher database can be backported to ETL catalog.
@@ -42,9 +66,6 @@ flowchart LR
     classDef node_ss fill:#002147,color:#fff
 ```
 
-## Open Numbers
-
-!!! warning "TO BE DONE"
 
 ## ETag
 

diff --git a/docs/guides/auto-regular-updates.md b/docs/guides/auto-regular-updates.md
@@ -1,7 +1,6 @@
 ---
 tags:
   - 👷 Staff
-status: new
 ---
 
 !!! warning "This is a work in progress"

diff --git a/docs/guides/data-work/export-data.md b/docs/guides/data-work/export-data.md
@@ -0,0 +1,157 @@
+---
+status: new
+---
+
+!!! warning "Export steps are a work in progress"
+
+Export steps are defined in `etl/steps/export` directory and have similar structure to regular steps. They are run with the `--export` flag:
+
+```bash
+etlr export://explorers/minerals/latest/minerals --export
+```
+
+The `def run(dest_dir):` function doesn't save a dataset, but calls a method that performs the action. For instance `create_explorer(...)` or `gh.commit_file_to_github(...)`. Once the step is executed successfully, it won't be run again unless its code or dependencies change (it won't be "dirty").
+
+## Creating explorers
+
+TSV files for explorers are created using the `create_explorer` function, usually from a configuration YAML file
+
+```py
+# Create a new explorers dataset and tsv file.
+ds_explorer = create_explorer(dest_dir=dest_dir, config=config, df_graphers=df_graphers)
+ds_explorer.save()
+```
+
+!!! info "Creating explorers on staging servers"
+
+    Explorers can be created or edited on staging servers and then manually migrated to production. Each staging server creates a branch in the `owid-content` repository. Editing explorers in Admin or running the `create_explorer` function pushes changes to that branch. Once the PR is merged, the branch gets pushed to the `owid-content` repository (not to the `master` branch, but its own branch). You then need to manually create a PR from that branch and merge it into `master`.
+
+
+## Creating multi-dimensional indicators
+
+Multi-dimensional indicators are powered by a configuration that is typically created from a YAML file. The structure of the YAML file looks like this:
+
+```yaml title="etl/steps/export/multidim/covid/latest/covid.deaths.yaml"
+definitions:
+  table: {definitions.table}
+
+title:
+  title: COVID-19 deaths
+  titleVariant: by interval
+defaultSelection:
+  - World
+  - Europe
+  - Asia
+topicTags:
+  - COVID-19
+
+dimensions:
+  - slug: interval
+    name: Interval
+    choices:
+      - slug: weekly
+        name: Weekly
+        description: null
+      - slug: biweekly
+        name: Biweekly
+        description: null
+
+  - slug: metric
+    name: Metric
+    choices:
+      - slug: absolute
+        name: Absolute
+        description: null
+      - slug: per_capita
+        name: Per million people
+        description: null
+      - slug: change
+        name: Change from previous interval
+        description: null
+
+views:
+  - dimensions:
+      interval: weekly
+      metric: absolute
+    indicators:
+      y: "{definitions.table}#weekly_deaths"
+  - dimensions:
+      interval: weekly
+      metric: per_capita
+    indicators:
+      y: "{definitions.table}#weekly_deaths_per_million"
+  - dimensions:
+      interval: weekly
+      metric: change
+    indicators:
+      y: "{definitions.table}#weekly_pct_growth_deaths"
+
+  - dimensions:
+      interval: biweekly
+      metric: absolute
+    indicators:
+      y: "{definitions.table}#biweekly_deaths"
+  - dimensions:
+      interval: biweekly
+      metric: per_capita
+    indicators:
+      y: "{definitions.table}#biweekly_deaths_per_million"
+  - dimensions:
+      interval: biweekly
+      metric: change
+    indicators:
+      y: "{definitions.table}#biweekly_pct_growth_deaths"
+```
+
+The `dimensions` field specifies selectors, and the `views` field defines views for the selection. Since there are numerous possible configurations, `views` are usually generated programmatically. However, it's a good idea to create a few of them manually to start.
+
+You can also combine manually defined views with generated ones. See the `etl.multidim` module for available helper functions or refer to examples from `etl/steps/export/multidim/`. Feel free to add or modify the helper functions as needed.
+
+The export step loads the YAML file, adds `views` to the config, and then calls the function.
+
+```python title="etl/steps/export/multidim/covid/latest/covid.py"
+def run(dest_dir: str) -> None:
+    engine = get_engine()
+
+    # Load YAML file
+    config = paths.load_mdim_config("covid.deaths.yaml")
+
+    multidim.upsert_multidim_data_page("mdd-energy", config, engine)
+```
+
+To see the multi-dimensional indicator in Admin, run
+
+```bash
+etlr export://multidim/energy/latest/energy --export
+```
+
+and check out the preview at http://staging-site-my-branch/admin/grapher/mdd-name.
+
+
+## Exporting data to GitHub
+
+One common use case for the `export` step is to commit a dataset to a GitHub repository. This is useful when we want to make a dataset available to the public. The pattern for this looks like this:
+
+```python
+if os.environ.get("CO2_BRANCH"):
+    dry_run = False
+    branch = os.environ["CO2_BRANCH"]
+else:
+    dry_run = True
+    branch = "master"
+
+gh.commit_file_to_github(
+    combined.to_csv(),
+    repo_name="co2-data",
+    file_path="owid-co2-data.csv",
+    commit_message=":bar_chart: Automated update",
+    branch=branch,
+    dry_run=dry_run,
+)
+```
+
+This code will commit the dataset to the `co2-data` repository on GitHub if you specify the `CO2_BRANCH` environment variable, i.e.
+
+```bash
+CO2_BRANCH=main etlr export://co2/latest/co2 --export
+```
diff --git a/docs/guides/data-work/index.md b/docs/guides/data-work/index.md
@@ -3,8 +3,6 @@ tags:
   - 👷 Staff
 ---
 
-# Data work
-
 Adding and updating datasets in ETL is part of our routinary work. To this end, we've simplified the process as much as possible. Find below the list of the steps involved in the workflow. Click on each step to learn more about it.
 
 ```mermaid

diff --git a/docs/guides/private-import.md b/docs/guides/private-import.md
@@ -3,11 +3,10 @@ tags:
   - 👷 Staff
 ---
 
-While most of the data at OWID is publicly available, some datasets are added to our catalogue with some restrictions. These include datasets that are not redistributable, or that are not meant to be shared with the public. This can happen due to a strict license by the data provider, or because the data is still in a draft stage and not ready for public consumption.
+While most of the data at OWID is publicly available, some datasets are added to our catalog with some restrictions. These include datasets that are not redistributable, or that are not meant to be shared with the public. This can happen due to a strict license by the data provider, or because the data is still in a draft stage and not ready for public consumption.
 
 Various privacy configurations are available:
 
-- Skip re-publishing to GitHub.
 - Disable data downloading options on Grapher.
 - Disable public access to the original file (snapshot).
 - Hide the dataset from our public catalog (accessible via `owid-catalog-py`).
@@ -16,6 +15,12 @@ In the following, we explain how to create private steps in the ETL pipeline and
 
 ## Create a private step
 
+
+!!! tip "Make your dataset completely private"
+
+    - **Snapshot**: Set `meta.is_public` to `false` in the snapshot DVC file.
+    - **Meadow, Garden, Grapher**: Use `data-private://` prefix in the step name in the DAG. Set `dataset.non_redistributable` to `true` in the dataset garden metadata.
+
 ### Snapshot
 
 To create a private snapshot step, set the `meta.is_public` property in the snapshot .dvc file to false:
@@ -34,7 +39,7 @@ This will prevent the file to be publicly accessible without the appropriate cre
 
 ### Meadow, Garden, Grapher
 
-Creating a private data step means that the data will not be listed in the public catalog, and therefore will not be accessible via `owid-catalog-py`. In addition, private datasets will not be re-published to GitHub.
+Creating a private data step means that the data will not be listed in the public catalog, and therefore will not be accessible via `owid-catalog-py`.
 
 To create a private data step (meadow, garden or grapher) simply use `data-private` prefix in the step name in the DAG. For example, the step `grapher/ihme_gbd/2024-06-10/leading_causes_deaths` (this is from [health.yml](https://github.com/owid/etl/blob/master/dag/health.yml)) is private:
 
@@ -70,8 +75,8 @@ etl run run [step-name] --private
 
 If you want to make a private step public simply follow the steps below:
 
-- **In the DAG:** Replace `data-private/` prefix with `data/`.
-- **In the snapshot DVC file**: Set `meta.is_public` to `true` (or simply remove `is_public` property).
-- (Optional) **Allow for Grapher downloads**: Set `dataset.non_redistributable` to `false` in the dataset garden metadata (or simply remove the property from the metadata).
+- **In the DAG:** Replace `data-private://` prefix with `data://`.
+- **In the snapshot DVC file**: Set `meta.is_public` to `true` (or simply remove this property).
+- (Optional) **Allow for Grapher downloads**: Set `dataset.non_redistributable` to `false` in the dataset garden metadata (or simply remove this property).
 
 After this, re-run the snapshot step and commit your changes.
diff --git a/docs/ignore/generate_dynamic_docs.py b/docs/ignore/generate_dynamic_docs.py
@@ -15,7 +15,7 @@
 
 - __[Indicator](#variable)__ (variable)
 - __[Origin](#origin)__
-- __[Table](#tables)__
+- __[Table](#table)__
 - __[Dataset](#dataset)__
 </div>
 

diff --git a/docs/overrides/main_aux.html b/docs/overrides/main_aux.html
@@ -0,0 +1,13 @@
+{% extends "base.html" %}
+
+{% block content %}
+{{ super() }}
+
+{% if git_page_authors %}
+<div class="md-source-date">
+    <small>
+        Authors: {{ git_page_authors | default('enable mkdocs-git-authors-plugin') }}
+    </small>
+</div>
+{% endif %}
+{% endblock %}
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -93,6 +93,8 @@ extra:
       link: https://ourworldindata.org
     - icon: fontawesome/brands/instagram
       link: https://instagram.com/ourworldindata
+    - icon: fontawesome/brands/bluesky
+      link: https://bsky.app/profile/ourworldindata.org
     - icon: fontawesome/brands/x-twitter
       link: https://twitter.com/ourworldindata
 
@@ -149,9 +151,12 @@ plugins:
   - git-authors:
       show_email_address: false
       # authorship_threshold_percent: 1
-      # show_contribution: true
+      show_contribution: true
       # show_line_count: true
       # count_empty_lines: true
+      ignore_authors:
+        - owidbot
+      sort_authors_by: contribution
   - git-revision-date-localized
   - tags:
       tags_file: tags.md
@@ -205,23 +210,29 @@ nav:
       - Contributing: "contributing.md"
   - Guides:
       - "guides/index.md"
-      - Data work:
+      - Adding data:
           - "guides/data-work/index.md"
-          - Adding data: "guides/data-work/add-data.md"
+          - New data: "guides/data-work/add-data.md"
           - Updating data: "guides/data-work/update-data.md"
           - Update charts: "guides/data-work/update-charts.md"
-      - Wizard: "guides/wizard.md"
-      - CLI: "guides/etl-cli.md"
-      - Harmonize country names: "guides/harmonize-countries.md"
-      - Using different environments: "guides/environment.md"
-      - Staging servers: "guides/staging-servers.md"
-      - Private dataset import to ETL: "guides/private-import.md"
-      - Automate regular updates: "guides/auto-regular-updates.md"
-      - Backport a dataset to ETL: "guides/backport.md"
-      - Metadata in data pages: "guides/metadata-play.md"
-      - Edit the documentation: "dev/docs.md"
-      - OpenAI setup: "guides/openai.md"
-      - Sharing with external people: "guides/sharing-external.md"
+          - Export data: "guides/data-work/export-data.md"
+      - Main tools:
+          - Wizard: "guides/wizard.md"
+          - CLI: "guides/etl-cli.md"
+          - Harmonize country names: "guides/harmonize-countries.md"
+          - Backport from database: "guides/backport.md"
+          - Regular updates: "guides/auto-regular-updates.md"
+      - Servers & settings:
+          - Environments: "guides/environment.md"
+          - Staging servers: "guides/staging-servers.md"
+          - Public servers: "guides/sharing-external.md"
+          - Private datasets: "guides/private-import.md"
+          - OpenAI setup: "guides/openai.md"
+
+      - Others:
+          - Edit the documentation: "dev/docs.md"
+          - Metadata in data pages: "guides/metadata-play.md"
+
 
   - Design principles:
       - Design principles & workflow: architecture/index.md

diff --git a/pyproject.toml b/pyproject.toml
@@ -92,10 +92,11 @@ dev-dependencies = [
     "boto3-stubs[s3]>=1.34.154",
     "gspread>=5.12.4",
     "jsonref>=1.1.0",
+    "mkdocs-material>=9.5.34",
     "mkdocs-jupyter>=0.24.8",
     "mkdocs-exclude>=1.0.2",
     "mkdocs-gen-files>=0.5.0",
-    "mkdocs-git-authors-plugin>=0.7.2",
+    "mkdocs-git-authors-plugin>=0.9.2",
     "mkdocs-git-revision-date-localized-plugin>=1.2.6",
     "mkdocs-click>=0.8.1",
     "mkdocs-glightbox>=0.3.7",