Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Jinja whitespaces and newlines #3657

Merged
merged 3 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 38 additions & 1 deletion docs/architecture/metadata/structuring-yaml.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,4 +233,41 @@ tables:
{definitions.conflict_type_estimate}
```

Be cautious with line breaks and trailing whitespace when utilizing templates. Despite using good defaults, you might end up experimenting a lot to get the desired result.
Line breaks and whitespaces can be tricky when using Jinja templates. We use reasonable defaults and strip whitespaces, so in most cases you should be fine with using `<%` and `%>`, but in more complex cases, you might have to experiment with
more fine grained [whitespace control](https://jinja.palletsprojects.com/en/stable/templates/#whitespace-control) using tags `<%-` and `-%>`. This is most often used in if-else blocks like this

```yaml
age: |-
<% if age_group == "ALLAges" %>
...
<%- elif age_group == "Age-standardized" %>
...
<%- else %>
...
<%- endif %>
```

The most straightforward way to check your metadata is in Admin, although that means waiting for your step to finish. There's a faster way to check your YAML file directly. Create a `playground.ipynb` notebook in the same folder as your YAML file and copy this to the first cell:

```python
from etl import grapher_helpers as gh
dim_dict = {
"age_group": "YEARS0-4", "sex": "Male", "cause": "Drug use disorders"
}
d = gh.render_yaml_file("ghe.meta.yml", dim_dict=dim_dict)
d["tables"]["ghe"]["variables"]["death_count"]
```

An alternative is examining `VariableMeta`

```python
from etl import grapher_helpers as gh
from etl import paths

tb = Dataset(paths.DATA_DIR / "garden/who/2024-07-30/ghe")['ghe']

# Sample a random row to get the dimension values
dim_dict = dict(zip(tb.index.names, tb.sample(1).index[0]))

gh.render_variable_meta(tb.death_count.m, dim_dict=dim_dict)
```
51 changes: 45 additions & 6 deletions etl/grapher_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,17 @@
from dataclasses import dataclass, field, is_dataclass
from functools import lru_cache
from pathlib import Path
from typing import Any, Dict, Iterable, List, Literal, Optional, Set, cast
from typing import Any, Dict, Iterable, List, Literal, Optional, Set, Union, cast

import jinja2
import numpy as np
import pandas as pd
import pymysql
import sqlalchemy
import structlog
from jinja2 import Environment
from owid import catalog
from owid.catalog import warnings
from owid.catalog.utils import underscore
from owid.catalog.utils import dynamic_yaml_load, dynamic_yaml_to_dict, underscore
from sqlalchemy import text
from sqlalchemy.engine import Engine
from sqlalchemy.orm import Session
Expand All @@ -23,7 +22,7 @@

log = structlog.get_logger()

jinja_env = Environment(
jinja_env = jinja2.Environment(
block_start_string="<%",
block_end_string="%>",
variable_start_string="<<",
Expand All @@ -32,8 +31,17 @@
comment_end_string="#>",
trim_blocks=True,
lstrip_blocks=True,
undefined=jinja2.StrictUndefined,
)


# Helper function to raise an error with << raise("uh oh...") >>
def raise_helper(msg):
raise Exception(msg)


jinja_env.globals["raise"] = raise_helper

# this might work too pd.api.types.is_integer_dtype(col)
INT_TYPES = tuple(
{f"{n}{b}{p}" for n in ("int", "Int", "uint", "UInt") for b in ("8", "16", "32", "64") for p in ("", "[pyarrow]")}
Expand Down Expand Up @@ -209,14 +217,18 @@ def _expand_jinja_text(text: str, dim_dict: Dict[str, str]) -> str:
return text

try:
return _cached_jinja_template(text).render(dim_dict)
# NOTE: we're stripping the result to avoid trailing newlines
return _cached_jinja_template(text).render(dim_dict).strip()
except jinja2.exceptions.TemplateSyntaxError as e:
new_message = f"{e.message}\n\nDimensions:\n{dim_dict}\n\nTemplate:\n{text}\n"
raise e.__class__(new_message, e.lineno, e.name, e.filename) from e
except jinja2.exceptions.UndefinedError as e:
new_message = f"{e.message}\n\nDimensions:\n{dim_dict}\n\nTemplate:\n{text}\n"
raise e.__class__(new_message) from e


def _expand_jinja(obj: Any, dim_dict: Dict[str, str]) -> Any:
"""Expand Jinja in all metadata fields."""
"""Expand Jinja in all metadata fields. This modifies the original object in place."""
if obj is None:
return None
elif isinstance(obj, str):
Expand All @@ -233,6 +245,33 @@ def _expand_jinja(obj: Any, dim_dict: Dict[str, str]) -> Any:
return obj


def render_yaml_file(path: Union[str, Path], dim_dict: Dict[str, str]) -> Dict[str, Any]:
"""Load YAML file and render Jinja in all fields. Return a dictionary.

Usage:
from etl import grapher_helpers as gh
from etl import paths

tb = Dataset(paths.DATA_DIR / "garden/who/2024-07-30/ghe")['ghe']
gh.render_variable_meta(tb.my_col.m, dim_dict={"sex": "male"})
"""
meta = dynamic_yaml_to_dict(dynamic_yaml_load(path))
return _expand_jinja(meta, dim_dict)


def render_variable_meta(meta: catalog.VariableMeta, dim_dict: Dict[str, str]) -> catalog.VariableMeta:
"""Render Jinja in all fields of VariableMeta. Return a new VariableMeta object.

Usage:
# Create a playground.ipynb next to YAML file and run this in notebook
from etl import grapher_helpers as gh
m = gh.render_yaml_file("ghe.meta.yml", dim_dict={"sex": "male"})
m['tables']['ghe']['variables']['death_count']
"""
# TODO: move this as a method to VariableMeta class
return _expand_jinja(meta.copy(), dim_dict)


def _title_column_and_dimensions(title: str, dim_dict: Dict[str, Any]) -> str:
"""Create new title from column title and dimensions.
For instance `Deaths`, ["age", "sex"], ["10-18", "male"] will be converted into
Expand Down
14 changes: 7 additions & 7 deletions etl/steps/data/garden/covid/latest/sequence.meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,13 @@ tables:
num_sequences:
title: "Number of sequenced COVID-19 genomes - Variant: << variant >>"
description_short: |-
<% if variant == 'non_who' %>
The number of analyzed sequences in the preceding two weeks that correspond to non-relevant variant groups. This number may not reflect the complete breakdown of cases since only a fraction of all cases are sequenced.
<% elif variant == 'other' %>
The number of analyzed sequences in the preceding two weeks that correspond to non-categorised variant groups. This number may not reflect the complete breakdown of cases since only a fraction of all cases are sequenced.
<% else %>
The number of analyzed sequences in the preceding two weeks that correspond to variant group '<< variant >>'. This number may not reflect the complete breakdown of cases since only a fraction of all cases are sequenced.
<%- endif -%>
<% set mapping = dict(
non_who="The number of analyzed sequences in the preceding two weeks that correspond to non-relevant variant groups. This number may not reflect the complete breakdown of cases since only a fraction of all cases are sequenced.",
other="The number of analyzed sequences in the preceding two weeks that correspond to non-categorised variant groups. This number may not reflect the complete breakdown of cases since only a fraction of all cases are sequenced.",
else="The number of analyzed sequences in the preceding two weeks that correspond to variant group '<< variant >>'. This number may not reflect the complete breakdown of cases since only a fraction of all cases are sequenced."
) %>

<< mapping.get(variant, mapping['else']) >>
unit: "sequenced genomes"
display:
tolerance: 28
Expand Down
Loading