Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine new metadata fields #1920

Merged
merged 5 commits into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 43 additions & 6 deletions etl/steps/data/garden/emdat/2023-09-20/natural_disasters.meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,43 @@ definitions:
presentation:
topic_tags:
- Natural Disasters
description_key:
- "EM-DAT defines the following variables:

- Affected: People requiring immediate assistance during a period of emergency, i.e. requiring basic survival needs such as food, water, shelter, sanitation and immediate medical assistance.

- Injured: People suffering from physical injuries, trauma or an illness requiring immediate medical assistance as a direct result of a disaster.

- Homeless: Number of people whose house is destroyed or heavily damaged and therefore need shelter after an event.

- Total affected: In EM-DAT, it is the sum of the injured, affected and left homeless after a disaster.

- Estimated economic damage: The amount of damage to property, crops, and livestock. In EM-DAT estimated damage are given in US$ ('000). For each disaster, the registered figure corresponds to the damage value at the moment of the event, i.e. the figures are shown true to the year of the event.

- Total deaths: In EM-DAT, it is the sum of deaths and missing."
- "EM-DAT defines the following types of disasters:

- Drought: An extended period of unusually low precipitation that produces a shortage of water for people, animals and plants. Drought is different from most other hazards in that it develops slowly, sometimes even over years, and its onset is generally difficult to detect. Drought is not solely a physical phenomenon because its impacts can be exacerbated by human activities and water supply demands. Drought is therefore often defined both conceptually and operationally. Operational definitions of drought, meaning the degree of precipitation reduction that constitutes a drought, vary by locality, climate and environmental sector.

- Earthquake: Sudden movement of a block of the Earth's crust along a geological fault and associated ground shaking.

- Extreme temperature: Extreme temperature.

- Flood: A general term for the overflow of water from a stream channel onto normally dry land in the floodplain (riverine flooding), higher-than-normal levels along the coast and in lakes or reservoirs (coastal flooding) as well as ponding of water at or near the point where the rain fell (flash floods).

- Fog: Water droplets that are suspended in the air near the Earth's surface. Fog is simply a cloud that is in contact with the ground.

- Glacial lake outburst: A flood that occurs when water dammed by a glacier or moraine is suddenly released. Glacial lakes can be at the front of the glacier (marginal lake) or below the ice sheet (sub-glacial lake).

- Landslide: Any kind of moderate to rapid soil movement incl. lahar, mudslide, debris flow. A landslide is the movement of soil or rock controlled by gravity and the speed of the movement usually ranges between slow and rapid, but not very slow. It can be superficial or deep, but the materials have to make up a mass that is a portion of the slope or the slope itself. The movement has to be downward and outward with a free face.

- Mass movement: Any type of downslope movement of earth materials.

- Extreme weather: Storm.

- Volcanic activity: A type of volcanic event near an opening/vent in the Earth's surface including volcanic eruptions of lava, ash, hot vapour, gas, and pyroclastic material.

- Wildfire: Any uncontrolled and non-prescribed combustion or burning of plants in a natural setting such as a forest, grassland, brush land or tundra, which consumes the natural fuels and spreads based on environmental conditions (e.g., wind, topography). Wildfires can be triggered by lightning or human actions."

dataset:
title: Natural disasters
Expand Down Expand Up @@ -66,24 +103,24 @@ tables:
- Uncategorized
total_dead_per_100k_people:
title: Total number of deaths per 100,000 people
unit: 'cases per 100k people'
unit: 'deaths per 100k people'
description_processing: &description-processing-100k |
Disaster-related impacts from EM-DAT have been normalized by Our World in Data to provide data in terms of cases per 100,000 people.
Disaster-related impacts from EM-DAT have been normalized by Our World in Data to provide data in terms of occurrences per 100,000 people.
injured_per_100k_people:
title: Number of injured persons per 100,000 people
unit: 'cases per 100k people'
unit: 'injured per 100k people'
description_processing: *description-processing-100k
affected_per_100k_people:
title: Number of affected persons per 100,000 people
unit: 'cases per 100k people'
unit: 'affected per 100k people'
description_processing: *description-processing-100k
homeless_per_100k_people:
title: Number of homeless persons per 100,000 people
unit: 'cases per 100k people'
unit: 'homeless per 100k people'
description_processing: *description-processing-100k
total_affected_per_100k_people:
title: Total number of affected persons per 100,000 people
unit: 'cases per 100k people'
unit: 'affected per 100k people'
description_processing: *description-processing-100k
n_events_per_100k_people:
title: Number of events per 100,000 people
Expand Down
9 changes: 9 additions & 0 deletions lib/catalog/owid/catalog/variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,13 @@ def get_unique_licenses_from_variables(variables: List[Variable]) -> List[Licens
return pd.unique(licenses).tolist()


def get_unique_description_key_points_from_variables(variables: List[Variable]) -> List[str]:
# Make a list of all description key points of all variables.
description_key_points = sum([variable.metadata.description_key for variable in variables], [])

return pd.unique(description_key_points).tolist()


def combine_variables_processing_logs(variables: List[Variable]) -> List[Dict[str, Any]]:
# Make a list with all entries in the processing log of all variables.
processing_log = sum(
Expand Down Expand Up @@ -487,6 +494,8 @@ def combine_variables_metadata(
metadata.description_short = _get_metadata_value_from_variables_if_all_identical(
variables=variables_only, field="description_short", operation=operation
)
metadata.description_key = get_unique_description_key_points_from_variables(variables=variables_only)
# TODO: Combine description_processing: If not identical, append one after another.
metadata.description_from_producer = _get_metadata_value_from_variables_if_all_identical(
variables=variables_only, field="description_from_producer", operation=operation
)
Expand Down
9 changes: 9 additions & 0 deletions lib/catalog/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,10 @@ def table_1(sources, licenses, origins):
title="Title of Table 1 Variable a",
description="Description of Table 1 Variable a",
description_short="Short description of Table 1 Variable a",
description_key=[
"Key description point 1 of Variable 1",
"Common key description point",
],
description_from_producer="Common description from producer",
sources=[sources[2], sources[1]],
origins=[origins[2], origins[1]],
Expand All @@ -111,6 +115,11 @@ def table_1(sources, licenses, origins):
title="Title of Table 1 Variable b",
description="Description of Table 1 Variable b",
description_short="Short description of Table 1 Variable b",
description_key=[
"Key description point 1 of Variable 2",
"Common key description point",
"Key description point 2 of Variable 2",
],
description_from_producer="Common description from producer",
sources=[sources[2], sources[3]],
origins=[origins[2], origins[3]],
Expand Down
51 changes: 51 additions & 0 deletions lib/catalog/tests/test_variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ def test_create_new_variable_as_sum_of_other_two(table_1, sources, origins, lice
# Since "a" and "b" have different title and description, "c" should have no title or description.
assert tb1["c"].metadata.title is None
assert tb1["c"].metadata.description is None
assert tb1["c"].metadata.description_key == [
"Key description point 1 of Variable 1",
"Common key description point",
"Key description point 1 of Variable 2",
"Key description point 2 of Variable 2",
]
assert tb1["c"].metadata.sources == [sources[2], sources[1], sources[3]]
assert tb1["c"].metadata.origins == [origins[2], origins[1], origins[3]]
assert tb1["c"].metadata.licenses == [licenses[1], licenses[2], licenses[3]]
Expand All @@ -99,6 +105,10 @@ def test_create_new_variable_as_sum_of_another_variable_plus_a_scalar(table_1) -
assert (tb1["d"] == pd.Series([2, 3, 4])).all()
assert tb1["d"].metadata.title == table_1["a"].metadata.title
assert tb1["d"].metadata.description == table_1["a"].metadata.description
assert tb1["d"].metadata.description_key == [
"Key description point 1 of Variable 1",
"Common key description point",
]
assert tb1["d"].metadata.sources == table_1["a"].metadata.sources
assert tb1["d"].metadata.origins == table_1["a"].metadata.origins
assert tb1["d"].metadata.licenses == table_1["a"].metadata.licenses
Expand Down Expand Up @@ -152,6 +162,12 @@ def test_create_new_variable_as_product_of_other_two(table_1, sources, origins,
assert (tb1["e"] == pd.Series([4, 10, 18])).all()
assert tb1["e"].metadata.title is None
assert tb1["e"].metadata.description is None
assert tb1["e"].metadata.description_key == [
"Key description point 1 of Variable 1",
"Common key description point",
"Key description point 1 of Variable 2",
"Key description point 2 of Variable 2",
]
assert tb1["e"].metadata.sources == [sources[2], sources[1], sources[3]]
assert tb1["e"].metadata.origins == [origins[2], origins[1], origins[3]]
assert tb1["e"].metadata.licenses == [licenses[1], licenses[2], licenses[3]]
Expand All @@ -172,6 +188,12 @@ def test_create_new_variable_as_product_of_other_three(table_1, sources, origins
assert (tb1["f"] == pd.Series([20, 70, 162])).all()
assert tb1["f"].metadata.title is None
assert tb1["f"].metadata.description is None
assert tb1["f"].metadata.description_key == [
"Key description point 1 of Variable 1",
"Common key description point",
"Key description point 1 of Variable 2",
"Key description point 2 of Variable 2",
]
assert tb1["f"].metadata.sources == [sources[2], sources[1], sources[3]]
assert tb1["f"].metadata.origins == [origins[2], origins[1], origins[3]]
assert tb1["f"].metadata.licenses == [licenses[1], licenses[2], licenses[3]]
Expand All @@ -190,6 +212,12 @@ def test_create_new_variable_as_division_of_other_two(table_1, sources, origins,
assert (tb1["g"] == pd.Series([0.25, 0.40, 0.50])).all()
assert tb1["g"].metadata.title is None
assert tb1["g"].metadata.description is None
assert tb1["g"].metadata.description_key == [
"Key description point 1 of Variable 1",
"Common key description point",
"Key description point 1 of Variable 2",
"Key description point 2 of Variable 2",
]
assert tb1["g"].metadata.sources == [sources[2], sources[1], sources[3]]
assert tb1["g"].metadata.origins == [origins[2], origins[1], origins[3]]
assert tb1["g"].metadata.licenses == [licenses[1], licenses[2], licenses[3]]
Expand All @@ -208,6 +236,13 @@ def test_create_new_variable_as_floor_division_of_other_two(table_1, sources, or
assert (tb1["h"] == pd.Series([4, 2, 2])).all()
assert tb1["h"].metadata.title is None
assert tb1["h"].metadata.description is None
# Note that the order of key description points should be first b and then a.
assert tb1["h"].metadata.description_key == [
"Key description point 1 of Variable 2",
"Common key description point",
"Key description point 2 of Variable 2",
"Key description point 1 of Variable 1",
]
assert tb1["h"].metadata.sources == [sources[2], sources[3], sources[1]]
assert tb1["h"].metadata.origins == [origins[2], origins[3], origins[1]]
assert tb1["h"].metadata.licenses == [licenses[2], licenses[3], licenses[1]]
Expand All @@ -226,6 +261,12 @@ def test_create_new_variable_as_module_division_of_other_two(table_1, sources, o
assert (tb1["i"] == pd.Series([1, 2, 3])).all()
assert tb1["i"].metadata.title is None
assert tb1["i"].metadata.description is None
assert tb1["i"].metadata.description_key == [
"Key description point 1 of Variable 1",
"Common key description point",
"Key description point 1 of Variable 2",
"Key description point 2 of Variable 2",
]
assert tb1["i"].metadata.sources == [sources[2], sources[1], sources[3]]
assert tb1["i"].metadata.origins == [origins[2], origins[1], origins[3]]
assert tb1["i"].metadata.licenses == [licenses[1], licenses[2], licenses[3]]
Expand All @@ -244,6 +285,10 @@ def test_create_new_variable_as_another_variable_to_the_power_of_a_scalar(table_
assert (tb1["j"] == pd.Series([1, 4, 9])).all()
assert tb1["j"].metadata.title == "Title of Table 1 Variable a"
assert tb1["j"].metadata.description == "Description of Table 1 Variable a"
assert tb1["j"].metadata.description_key == [
"Key description point 1 of Variable 1",
"Common key description point",
]
assert tb1["j"].metadata.sources == [sources[2], sources[1]]
assert tb1["j"].metadata.origins == [origins[2], origins[1]]
assert tb1["j"].metadata.licenses == [licenses[1]]
Expand All @@ -261,6 +306,12 @@ def test_create_new_variables_as_another_variable_to_the_power_of_another_variab
assert (tb1["k"] == pd.Series([1, 32, 729])).all()
assert tb1["k"].metadata.title is None
assert tb1["k"].metadata.description is None
assert tb1["k"].metadata.description_key == [
"Key description point 1 of Variable 1",
"Common key description point",
"Key description point 1 of Variable 2",
"Key description point 2 of Variable 2",
]
assert tb1["k"].metadata.sources == [sources[2], sources[1], sources[3]]
assert tb1["k"].metadata.origins == [origins[2], origins[1], origins[3]]
assert tb1["k"].metadata.licenses == [licenses[1], licenses[2], licenses[3]]
Expand Down
48 changes: 5 additions & 43 deletions snapshots/emdat/2023-09-20/natural_disasters.xlsx.dvc
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,15 @@ meta:
origin:
title: Natural disasters
producer: EM-DAT, CRED / UCLouvain
citation_full: EM-DAT, CRED / UCLouvain, Brussels, Belgium – [www.emdat.be](www.emdat.be).
citation_full: EM-DAT, CRED / UCLouvain, Brussels, Belgium - www.emdat.be (2023).
url_main: https://emdat.be/
date_published: 2023-09-20
date_accessed: 2023-09-20
license:
url: https://public.emdat.be/about
name: UCLouvain 2022
description: |
EM-DAT data includes all categories classified as "natural disasters" (distinguished from technological disasters, such as oil spills and industrial accidents). This includes those from drought, earthquakes, extreme temperatures, extreme weather, floods, fogs, glacial lake outbursts, landslide, dry mass movements, volcanic activity, and wildfires.

EM-DAT defines the following variables:

- Affected: People requiring immediate assistance during a period of emergency, i.e. requiring basic survival needs such as food, water, shelter, sanitation and immediate medical assistance.

- Injured: People suffering from physical injuries, trauma or an illness requiring immediate medical assistance as a direct result of a disaster.

- Homeless: Number of people whose house is destroyed or heavily damaged and therefore need shelter after an event.

- Total affected: In EM-DAT, it is the sum of the injured, affected and left homeless after a disaster.

- Estimated economic damage: The amount of damage to property, crops, and livestock. In EM-DAT estimated damage are given in US$ ('000). For each disaster, the registered figure corresponds to the damage value at the moment of the event, i.e. the figures are shown true to the year of the event.

- Total deaths: In EM-DAT, it is the sum of deaths and missing.

EM-DAT defines the following types of disasters:

- Drought: An extended period of unusually low precipitation that produces a shortage of water for people, animals and plants. Drought is different from most other hazards in that it develops slowly, sometimes even over years, and its onset is generally difficult to detect. Drought is not solely a physical phenomenon because its impacts can be exacerbated by human activities and water supply demands. Drought is therefore often defined both conceptually and operationally. Operational definitions of drought, meaning the degree of precipitation reduction that constitutes a drought, vary by locality, climate and environmental sector.

- Earthquake: Sudden movement of a block of the Earth's crust along a geological fault and associated ground shaking.

- Extreme temperature: Extreme temperature.

- Flood: A general term for the overflow of water from a stream channel onto normally dry land in the floodplain (riverine flooding), higher-than-normal levels along the coast and in lakes or reservoirs (coastal flooding) as well as ponding of water at or near the point where the rain fell (flash floods).

- Fog: Water droplets that are suspended in the air near the Earth's surface. Fog is simply a cloud that is in contact with the ground.

- Glacial lake outburst: A flood that occurs when water dammed by a glacier or moraine is suddenly released. Glacial lakes can be at the front of the glacier (marginal lake) or below the ice sheet (sub-glacial lake).

- Landslide: Any kind of moderate to rapid soil movement incl. lahar, mudslide, debris flow. A landslide is the movement of soil or rock controlled by gravity and the speed of the movement usually ranges between slow and rapid, but not very slow. It can be superficial or deep, but the materials have to make up a mass that is a portion of the slope or the slope itself. The movement has to be downward and outward with a free face.

- Mass movement: Any type of downslope movement of earth materials.

- Extreme weather: Storm.

- Volcanic activity: A type of volcanic event near an opening/vent in the Earth's surface including volcanic eruptions of lava, ash, hot vapour, gas, and pyroclastic material.

- Wildfire: Any uncontrolled and non-prescribed combustion or burning of plants in a natural setting such as a forest, grassland, brush land or tundra, which consumes the natural fuels and spreads based on environmental conditions (e.g., wind, topography). Wildfires can be triggered by lightning or human actions.
url: https://doc.emdat.be/docs/legal/terms-of-use/
name: UCLouvain 2023
description: |-
EM-DAT contains data on the occurrence and impacts of mass disasters worldwide from 1900 to the present day. EM-DAT data includes all categories classified as "natural disasters" (distinguished from technological disasters, such as oil spills and industrial accidents). This includes those from drought, earthquakes, extreme temperatures, extreme weather, floods, fogs, glacial lake outbursts, landslide, dry mass movements, volcanic activity, and wildfires.

license:
url: https://public.emdat.be/about
Expand Down