Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 war: add conflict participants #1881

Merged
merged 58 commits into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
fc20cee
tag is required
lucasrodes Oct 27, 2023
32cc3f6
improve layout
lucasrodes Oct 27, 2023
fde1ccc
Merge branch 'master' of https://github.com/owid/etl
lucasrodes Oct 30, 2023
2225c11
Merge branch 'master' of https://github.com/owid/etl
lucasrodes Oct 31, 2023
9590633
improve gleditsch
lucasrodes Oct 31, 2023
0ffa611
add participants
lucasrodes Oct 31, 2023
7df3d38
lint
lucasrodes Oct 31, 2023
c10f262
lint
lucasrodes Oct 31, 2023
85a212b
minor tweaks
lucasrodes Oct 31, 2023
a2ebd05
prio
lucasrodes Oct 31, 2023
93aac42
tweak
lucasrodes Oct 31, 2023
9a13d7b
Merge branch 'master' of https://github.com/owid/etl
lucasrodes Nov 1, 2023
853b730
Merge branch 'master' into data/how-country-level
lucasrodes Nov 1, 2023
3d7d29f
ucdp_prio
lucasrodes Nov 1, 2023
3eeebdc
add old year filter
lucasrodes Nov 1, 2023
55013d9
lint
lucasrodes Nov 1, 2023
a2ea23b
minor tweak
lucasrodes Nov 1, 2023
12ff65f
country table
lucasrodes Nov 1, 2023
f011378
Merge branch 'master' of https://github.com/owid/etl
lucasrodes Nov 1, 2023
1c3fcc0
Merge branch 'master' into data/how-country-level
lucasrodes Nov 1, 2023
8c45f1a
add participants cow
lucasrodes Nov 1, 2023
8458a09
typo
lucasrodes Nov 1, 2023
1a48b3d
working on mars
lucasrodes Nov 2, 2023
48b6196
wip
lucasrodes Nov 2, 2023
d10b9c0
Merge branch 'master' of https://github.com/owid/etl
lucasrodes Nov 2, 2023
125d930
mie
lucasrodes Nov 2, 2023
94bed74
lint
lucasrodes Nov 2, 2023
83b06cf
harmonise argument name
lucasrodes Nov 2, 2023
ddd9a18
cow_mid
lucasrodes Nov 2, 2023
40d4844
wip
lucasrodes Nov 2, 2023
432deda
rollback
lucasrodes Nov 2, 2023
67cd647
grapher
lucasrodes Nov 2, 2023
6fb1a27
metadata fixes
lucasrodes Nov 3, 2023
aa7172f
typo
lucasrodes Nov 3, 2023
23bf56b
Merge branch 'master' of https://github.com/owid/etl
lucasrodes Nov 3, 2023
aa942a3
Merge branch 'master' into data/how-country-level
lucasrodes Nov 3, 2023
aecbc60
mars
lucasrodes Nov 3, 2023
7dba9d5
fix serbia
lucasrodes Nov 3, 2023
044bd9e
Merge branch 'master' of https://github.com/owid/etl
lucasrodes Nov 3, 2023
ec2703d
Merge branch 'master' into data/how-country-level
lucasrodes Nov 3, 2023
4b5ec64
hostility level
lucasrodes Nov 6, 2023
fc953aa
Merge branch 'master' of https://github.com/owid/etl
lucasrodes Nov 6, 2023
0e262d0
Merge branch 'master' into data/how-country-level
lucasrodes Nov 6, 2023
421da6b
utils
lucasrodes Nov 6, 2023
0e136f5
ucdp: number of participants
lucasrodes Nov 6, 2023
b12d36d
prio
lucasrodes Nov 6, 2023
289bc94
ucdp/prio
lucasrodes Nov 6, 2023
ffb0056
ucdp
lucasrodes Nov 6, 2023
9ff35cd
utils
lucasrodes Nov 6, 2023
495b6a5
lint
lucasrodes Nov 6, 2023
bfce156
ci/cd
lucasrodes Nov 6, 2023
5f00ac1
cow: number participants
lucasrodes Nov 6, 2023
61ed2a8
comments
lucasrodes Nov 6, 2023
73b8513
generalise function
lucasrodes Nov 6, 2023
bd5561d
refactor
lucasrodes Nov 6, 2023
ae6d8d7
cow_mid, mie
lucasrodes Nov 6, 2023
4bed68d
remove unnecessary file
lucasrodes Nov 6, 2023
5c04d28
update
lucasrodes Nov 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 27 additions & 3 deletions etl/steps/data/garden/countries/2023-09-25/gleditsch.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
import owid.catalog.processing as pr
from owid.catalog import Dataset, Table
from shared import (
LAST_YEAR,
add_latest_years_with_constant_num_countries,
add_population_to_table,
fill_timeseries,
init_table_countries_in_region,
)

Expand Down Expand Up @@ -33,12 +35,13 @@ def run(dest_dir: str) -> None:
#
tb = geo.harmonize_countries(df=tb, countries_file=paths.country_mapping_path)

# Minor fix
tb.loc[tb["country"] == "German Federal Republic", "end"] = "02:10:1990"

# Format table
tb_formatted = format_table(tb)

# Minor fix
## GW code 260 should be referred to as 'West Germany' until 1990, then as 'Germany'
tb_formatted.loc[(tb_formatted["id"] == 260) & (tb_formatted["year"] >= 1990), "country"] = "Germany"

# Create new table
tb_regions = create_table_countries_in_region(tb_formatted, ds_pop)

Expand All @@ -48,9 +51,13 @@ def run(dest_dir: str) -> None:
# Combine tables
tb_regions = tb_regions.merge(tb_pop, how="left", on=["region", "year"])

# Get table with id, year, country (whenever that country was present)
tb_countries = create_table_country_years(tb_formatted)

# Add to table list
tables = [
tb.set_index(["id", "start", "end"], verify_integrity=True).sort_index(),
tb_countries.set_index(["id", "year"], verify_integrity=True).sort_index(),
tb_regions.set_index(["region", "year"], verify_integrity=True).sort_index(),
]

Expand Down Expand Up @@ -87,6 +94,23 @@ def format_table(tb: Table) -> Table:
return tb


def create_table_country_years(tb: Table) -> Table:
"""Create table with each country present in a year."""
tb_countries = tb[["id", "year", "country"]].copy()

# define mask for last year
mask = tb_countries["year"] == EXPECTED_LAST_YEAR

tb_last = fill_timeseries(
tb_countries[mask].drop(columns="year"),
EXPECTED_LAST_YEAR + 1,
LAST_YEAR,
)

tb = pr.concat([tb_countries, tb_last], ignore_index=True, short_name="gleditsch_countries")
return tb


def create_table_countries_in_region(tb: Table, ds_pop: Dataset) -> Table:
"""Create table with number of countries in each region per year."""
# Get number of countries per region per year
Expand Down
38 changes: 38 additions & 0 deletions etl/steps/data/garden/countries/2023-09-25/isd.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
import owid.catalog.processing as pr
from owid.catalog import Table
from shared import (
LAST_YEAR,
add_latest_years_with_constant_num_countries,
add_population_to_table,
fill_timeseries,
init_table_countries_in_region,
)
from structlog import get_logger
Expand Down Expand Up @@ -60,10 +62,14 @@ def run(dest_dir: str) -> None:
# Combine tables
tb_regions = tb_regions.merge(tb_pop, how="left", on=["region", "year"])

# Get table with id, year, country (whenever that country was present)
tb_countries = create_table_country_years(tb_formatted)

# Add to tables list
tables = [
tb.set_index(["cownum", "start", "end"], verify_integrity=True).sort_index(),
tb_regions.set_index(["region", "year"], verify_integrity=True).sort_index(),
tb_countries.set_index(["id", "year"], verify_integrity=True).sort_index(),
]

# tb = tb.set_index(["country", "year"], verify_integrity=True)
Expand Down Expand Up @@ -237,3 +243,35 @@ def code_to_region_alt(cow_code: int) -> str:
return "North Africa and the Middle East"
case _:
return "Rest"


def create_table_country_years(tb: Table) -> Table:
"""Create table with each country present in a year."""
tb_countries = (
tb[["cownum", "year", "statename"]]
.copy()
.rename(
columns={
"cownum": "id",
"statename": "country",
}
)
)

# define mask for last year
mask = tb_countries["year"] == EXPECTED_LAST_YEAR

tb_last = fill_timeseries(
tb_countries[mask].drop(columns="year"),
EXPECTED_LAST_YEAR + 1,
LAST_YEAR,
)

tb = pr.concat([tb_countries, tb_last], ignore_index=True, short_name="isd_countries")

# Fix country names
## Serbia and Montenegro, Serbia
tb["country"] = tb["country"].astype(str)
tb.loc[(tb["id"] == 345) & (tb["year"] >= 1992) & (tb["year"] < 2006), "country"] = "Serbia and Montenegro"
tb.loc[(tb["id"] == 345) & (tb["year"] >= 2006), "country"] = "Serbia"
return tb
64 changes: 58 additions & 6 deletions etl/steps/data/garden/countries/2023-09-25/shared.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from datetime import datetime as dt
from typing import Optional, cast

import owid.catalog.processing as pr
import pandas as pd
Expand Down Expand Up @@ -45,13 +46,17 @@ def init_table_countries_in_region(
return tb_regions


def add_latest_years_with_constant_num_countries(tb_regions: Table, column_year: str, expected_last_year: int) -> Table:
def add_latest_years_with_constant_num_countries(
tb: Table,
column_year: str,
expected_last_year: int,
) -> Table:
"""Extend data until LAST_YEAR with constant number of countries.

Data stops at expected_last_year, extend it until LAST_YEAR with constant number of countries.
"""
# Check latest year is as expected, drop year column
tb_last = tb_regions.sort_values(column_year).drop_duplicates(subset=["region"], keep="last")
tb_last = tb.sort_values(column_year).drop_duplicates(subset=["region"], keep="last")
assert (tb_last.year.unique() == expected_last_year).all(), f"Last year is not {expected_last_year}!"
tb_last = tb_last.drop(columns=[column_year])

Expand All @@ -60,9 +65,9 @@ def add_latest_years_with_constant_num_countries(tb_regions: Table, column_year:
tb_last = tb_last[["region", "number_countries"]].merge(tb_all_years, how="cross")

# Add to main table
tb_regions = pr.concat([tb_regions, tb_last], ignore_index=True).sort_values(["region", column_year])
tb = pr.concat([tb, tb_last], ignore_index=True).sort_values(["region", column_year])

return tb_regions
return tb


def expand_observations(tb: Table, col_year_start: str, col_year_end: str) -> Table:
Expand All @@ -75,14 +80,61 @@ def expand_observations(tb: Table, col_year_start: str, col_year_end: str) -> Ta
# Add missing years for each triplet ("warcode", "campcode", "ccode")
YEAR_MIN = tb[col_year_start].min()
YEAR_MAX = tb[col_year_end].max()
tb_all_years = Table(pd.RangeIndex(YEAR_MIN, YEAR_MAX + 1), columns=["year"])
tb = tb.merge(tb_all_years, how="cross")
if "year" in tb.columns:
raise ValueError("Column 'year' already in table!")
else:
tb = fill_timeseries(tb, YEAR_MIN, YEAR_MAX)
# Filter only entries that actually existed
tb = tb[(tb["year"] >= tb[col_year_start]) & (tb["year"] < tb[col_year_end])]

return tb


def fill_timeseries(
tb: Table,
year_min: Optional[int],
year_max: Optional[int],
default_min: bool = False,
default_max: bool = False,
col_year_start: Optional[str] = None,
col_year_end: Optional[str] = None,
filter_times: bool = False,
) -> Table:
"""Complement table with missing years."""
# Get starting year
if default_min:
if col_year_start in tb.columns:
year_min = tb[col_year_start].min()
else:
raise ValueError(f"{col_year_start} not in table columns!")
elif year_min is None:
raise ValueError("Either `year_min` must be a value or `default_min` must be True")
# Get ending year
if default_max:
if (col_year_end) and (col_year_end in tb.columns):
year_max = tb[col_year_end].max()
else:
raise ValueError(f"{col_year_end} not in table columns!")
elif year_max is None:
raise ValueError("Either `year_max` must be a value or `default_max` must be True")

# Cross merge with missing years
tb_all_years = Table(pd.RangeIndex(year_min, cast(int, year_max) + 1), columns=["year"])
if "year" in tb.columns:
raise ValueError("Column 'year' already in table! Please drop it from `tb`.")
tb = tb.merge(tb_all_years, how="cross")

# Only keep years that 'make sense'
if filter_times:
if (col_year_end and (col_year_end not in tb.columns)) or (
col_year_start and (col_year_start not in tb.columns)
):
raise ValueError(f"Columns {col_year_start} and {col_year_end} must be in table columns!")
else:
tb = tb[(tb["year"] >= tb[col_year_start]) & (tb["year"] < tb[col_year_end])]
return tb


def _get_start_year(date_str: str, date_format: str) -> int:
date = dt.strptime(date_str, date_format)
return date.year
Expand Down
72 changes: 72 additions & 0 deletions etl/steps/data/garden/countries/2023-09-29/cow_ssm.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""Load a meadow dataset and create a garden dataset."""

from typing import Optional, cast

import owid.catalog.processing as pr
import pandas as pd
from owid.catalog import Dataset, Table
Expand Down Expand Up @@ -66,12 +68,16 @@ def run(dest_dir: str) -> None:
# Combine tables
tb_regions = tb_regions.merge(tb_pop, how="left", on=["region", "year"])

# Get table with id, year, country (whenever that country was present)
tb_countries = create_table_country_years(tb)

# Group tables and format tables
tables = [
tb_system.set_index(["ccode", "year"], verify_integrity=True).sort_index(),
tb_states.set_index(["ccode", "styear", "stmonth", "stday", "endyear", "endmonth", "endday"]).sort_index(),
tb_majors.set_index(["ccode", "styear", "stmonth", "stday", "endyear", "endmonth", "endday"]).sort_index(),
tb_regions.set_index(["region", "year"], verify_integrity=True).sort_index(),
tb_countries.set_index(["id", "year"], verify_integrity=True).sort_index(),
]

#
Expand Down Expand Up @@ -170,3 +176,69 @@ def add_population_to_table(tb: Table, ds_pop: Dataset, country_col: str = "coun
tb_pop = pr.concat([tb_pop_regions, tb_pop_world], ignore_index=True)

return tb_pop


def create_table_country_years(tb: Table) -> Table:
"""Create table with each country present in a year."""
tb_countries = tb[["ccode", "year", "statenme"]].copy()

tb_countries = tb_countries.rename(columns={"ccode": "id", "statenme": "country"})

# define mask for last year
mask = tb_countries["year"] == EXPECTED_LAST_YEAR

tb_last = fill_timeseries(
tb_countries[mask].drop(columns="year"),
EXPECTED_LAST_YEAR + 1,
LAST_YEAR,
)

tb = pr.concat([tb_countries, tb_last], ignore_index=True, short_name="cow_ssm_countries")

tb["year"] = tb["year"].astype(int)
return tb


def fill_timeseries(
tb: Table,
year_min: Optional[int],
year_max: Optional[int],
default_min: bool = False,
default_max: bool = False,
col_year_start: Optional[str] = None,
col_year_end: Optional[str] = None,
filter_times: bool = False,
) -> Table:
"""Complement table with missing years."""
# Get starting year
if default_min:
if col_year_start in tb.columns:
year_min = tb[col_year_start].min()
else:
raise ValueError(f"{col_year_start} not in table columns!")
elif year_min is None:
raise ValueError("Either `year_min` must be a value or `default_min` must be True")
# Get ending year
if default_max:
if (col_year_end) and (col_year_end in tb.columns):
year_max = tb[col_year_end].max()
else:
raise ValueError(f"{col_year_end} not in table columns!")
elif year_max is None:
raise ValueError("Either `year_max` must be a value or `default_max` must be True")

# Cross merge with missing years
tb_all_years = Table(pd.RangeIndex(year_min, cast(int, year_max) + 1), columns=["year"])
if "year" in tb.columns:
raise ValueError("Column 'year' already in table! Please drop it from `tb`.")
tb = tb.merge(tb_all_years, how="cross")

# Only keep years that 'make sense'
if filter_times:
if (col_year_end and (col_year_end not in tb.columns)) or (
col_year_start and (col_year_start not in tb.columns)
):
raise ValueError(f"Columns {col_year_start} and {col_year_end} must be in table columns!")
else:
tb = tb[(tb["year"] >= tb[col_year_start]) & (tb["year"] < tb[col_year_end])]
return tb
9 changes: 6 additions & 3 deletions etl/steps/data/garden/war/2023-09-21/brecke.meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ definitions:
description_key: &description_key_deaths
- |-
{definitions.all.conflict_type_ongoing}
- {definitions.all.interstate_conflicts}
- |
{definitions.all.interstate_conflicts}
- Deaths of combatants and civilians due to fighting, disease, and starvation are included.
- For conflicts without any deaths estimate, we conversatively coded the Conflict Catalog's lower bound for including a conflict, 32 deaths each year.

Expand All @@ -52,7 +53,8 @@ definitions:
description_key: &description_key_ongoing
- |-
{definitions.all.conflict_type_ongoing}
- {definitions.all.interstate_conflicts}
- |
{definitions.all.interstate_conflicts}
- We count a conflict as ongoing in a region even if the conflict is also ongoing in other regions. The sum across all regions can therefore be higher than the total number of ongoing conflicts.

number_new_conflicts:
Expand All @@ -73,7 +75,8 @@ definitions:
<% elif conflict_type == "internal" %>
A new internal conflict is a conflict between a state and a non-state armed groups, between non-state armed groups, or between an armed group and civilians, that causes at least 32 deaths during a year for the first time.
<% endif %>
- {definitions.all.interstate_conflicts}
- |
{definitions.all.interstate_conflicts}
- |-
<% if (conflict_type == "interstate" or conflict_type == "internal") %>
We count a conflict as new in a region even if the conflict started at the same time in another region. The sum across all regions can therefore be higher than the total number of new conflicts.
Expand Down
20 changes: 20 additions & 0 deletions etl/steps/data/garden/war/2023-09-21/cow.meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,26 @@ dataset:
You can find more information about the data in our article: [To be published]

tables:

# COUNTRY-LEVEL
cow_country:
variables:
participated_in_conflict:
title: Participated in conflict
unit: ""
display:
numDecimalPlaces: 0
description_short: |-
Whether the country participated in a conflict (of a specific kind) in a given year.

number_participants:
title: Number of countries in conflict
unit: "countries"
display:
numDecimalPlaces: 0
description_short: |-
The number of countries that participated in a conflict (of a specific kind) in a given year and region.

cow:
variables:
##################
Expand Down
Loading