Skip to content

Commit

Permalink
🐛 birth rate in HMD (#3690)
Browse files Browse the repository at this point in the history
* 📊 closer look at population estimates by HMD

* wip

* fix birth rate estimation

* drop column
  • Loading branch information
lucasrodes authored Dec 4, 2024
1 parent 815f5d6 commit f25bf83
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 5 deletions.
14 changes: 11 additions & 3 deletions etl/steps/data/garden/hmd/2024-12-01/hmd.meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ definitions:
display_name_dim: |-
at << 'birth' if (age == '0') else age >><< ', ' + sex + 's' if (sex != 'total') >>, << type >>
title_public_dim: |-
at << age if age != '0' else 'birth'>>
<% if age != 'total' %>at << age if age != '0' else 'birth'>><% endif %>
global:
life_expectancy:
point_1: |-
Expand Down Expand Up @@ -251,28 +251,36 @@ tables:
presentation:
topic_tags:
- Population Growth
title_variant: << sex + 's, ' if sex != 'total' >>

variables:
population:
title: Population
unit: people
description_short: |-
<% if age == 'total' %>
<%- if sex == 'total' %>
The total number of people living in a country.
<%- else %>
The total number of << sex + 's' >> living in a country.
<%- endif %>
<%- else %>
<% if sex == 'total' %>
The total number of people aged << age >> living in a country.
<%- else %>
The total number of << sex + 's' >> aged << age >> living in a country.
<%- endif %>
<%- endif %>
description_processing: |-
From HMD Notes: For populations with territorial changes, two sets of population estimates are given for years in which a territorial change occurred. The first set of estimates (identified as year "19xx-") refers to the population just before the territorial change, whereas the second set (identified as year "19xx+") refers to the population just after the change. For example, in France, the data for "1914-" cover the previous territory (i.e., as of December 31, 1913), whereas the data for "1914+" reflect the territorial boundaries as of January 1, 1914.
We have used the "19xx+" population estimates for the year of the territorial change.
display:
name: |-
{tables.population.variables.population.title} aged << age >><< ', ' + sex + 's' if (sex != 'total') >>
{tables.population.variables.population.title}<< 'aged ' + age if (age != 'total') >><< ', ' + sex + 's' if (sex != 'total') >>
presentation:
title_public: |-
{tables.population.variables.population.title} {definitions.others.title_public_dim}
title_variant: << sex + 's, ' if sex != 'total' >>

births:
common:
Expand Down
13 changes: 12 additions & 1 deletion etl/steps/data/garden/hmd/2024-12-01/hmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import numpy as np
from owid.catalog import Table
from owid.catalog import processing as pr

from etl.data_helpers import geo
from etl.helpers import PathFinder, create_dataset
Expand Down Expand Up @@ -84,6 +85,7 @@ def _sanity_check_lt(tb):
tb=tb_pop,
col_index=["country", "year", "sex", "age"],
)
tb_pop = add_total_population(tb_pop)

# 5/ Births
tb_births = process_table(
Expand All @@ -92,7 +94,7 @@ def _sanity_check_lt(tb):
)

def add_birth_rate(tb_pop, tb_births):
tb_pop_agg = tb_pop.groupby(["country", "year", "sex"], as_index=False)["population"].sum()
tb_pop_agg = tb_pop[tb_pop["age"] == "total"].drop(columns="age")
tb_births = tb_births.merge(tb_pop_agg, on=["country", "year", "sex"], how="left")
tb_births["birth_rate"] = tb_births["births"] / tb_births["population"] * 1_000
tb_births["birth_rate"] = tb_births["birth_rate"].replace([np.inf, -np.inf], np.nan)
Expand Down Expand Up @@ -188,6 +190,15 @@ def standardize_sex_cat_names(tb, sex_expected):
return tb


def add_total_population(tb_pop):
flag = tb_pop["age"].str.match(r"^(\d{1,3}|\d{3}\+)$")
tb_pop_total = tb_pop[flag]
tb_pop_total = tb_pop_total.groupby(["country", "year", "sex"], as_index=False)["population"].sum()
tb_pop_total["age"] = "total"
tb_pop = pr.concat([tb_pop, tb_pop_total], ignore_index=True)
return tb_pop


def make_table_diffs_ratios(tb: Table) -> Table:
"""Create table with metric differences and ratios.
Expand Down
3 changes: 2 additions & 1 deletion etl/steps/data/garden/hmd/2024-12-03/hmd_country.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,8 @@ def _prepare_population_table(tb):
Original table is given in years, but we need it in days! We use linear interpolation for that.
"""
tb_aux = tb.loc[(tb["sex"] == "total") & ~(tb["age"].str.contains("-")), ["country", "year", "population"]]
flag = tb["age"].str.match(r"^(\d{1,3}|\d{3}\+)$")
tb_aux = tb.loc[(tb["sex"] == "total") & flag, ["country", "year", "population"]]
tb_aux = tb_aux.groupby(["country", "year"], as_index=False)["population"].sum()
## Assign a day to population. TODO: Check if this is true
tb_aux["date"] = pd.to_datetime(tb_aux["year"].astype(str) + "-01-01")
Expand Down
1 change: 1 addition & 0 deletions etl/steps/data/grapher/hmd/2024-12-01/hmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ def keep_only_relevant_dimensions(tb):
45,
65,
80,
"total",
]
AGES_SINGLE = list(map(str, AGES_SINGLE)) + ["110+"]
flag_1 = tb["age"].isin(AGES_SINGLE)
Expand Down

0 comments on commit f25bf83

Please sign in to comment.