From ca02263850202949b4c0c6e734dae97c4ba6b054 Mon Sep 17 00:00:00 2001 From: Pablo Rosado Date: Tue, 7 Nov 2023 16:23:55 +0100 Subject: [PATCH 1/4] Improve natural disasters metadata --- .../2023-09-20/natural_disasters.meta.yml | 49 ++++++++++++++++--- 1 file changed, 43 insertions(+), 6 deletions(-) diff --git a/etl/steps/data/garden/emdat/2023-09-20/natural_disasters.meta.yml b/etl/steps/data/garden/emdat/2023-09-20/natural_disasters.meta.yml index 0c53b89a604..1346a6e5b26 100644 --- a/etl/steps/data/garden/emdat/2023-09-20/natural_disasters.meta.yml +++ b/etl/steps/data/garden/emdat/2023-09-20/natural_disasters.meta.yml @@ -3,6 +3,43 @@ definitions: presentation: topic_tags: - Natural Disasters + description_key: + - "EM-DAT defines the following variables: + + - Affected: People requiring immediate assistance during a period of emergency, i.e. requiring basic survival needs such as food, water, shelter, sanitation and immediate medical assistance. + + - Injured: People suffering from physical injuries, trauma or an illness requiring immediate medical assistance as a direct result of a disaster. + + - Homeless: Number of people whose house is destroyed or heavily damaged and therefore need shelter after an event. + + - Total affected: In EM-DAT, it is the sum of the injured, affected and left homeless after a disaster. + + - Estimated economic damage: The amount of damage to property, crops, and livestock. In EM-DAT estimated damage are given in US$ ('000). For each disaster, the registered figure corresponds to the damage value at the moment of the event, i.e. the figures are shown true to the year of the event. + + - Total deaths: In EM-DAT, it is the sum of deaths and missing." + - "EM-DAT defines the following types of disasters: + + - Drought: An extended period of unusually low precipitation that produces a shortage of water for people, animals and plants. Drought is different from most other hazards in that it develops slowly, sometimes even over years, and its onset is generally difficult to detect. Drought is not solely a physical phenomenon because its impacts can be exacerbated by human activities and water supply demands. Drought is therefore often defined both conceptually and operationally. Operational definitions of drought, meaning the degree of precipitation reduction that constitutes a drought, vary by locality, climate and environmental sector. + + - Earthquake: Sudden movement of a block of the Earth's crust along a geological fault and associated ground shaking. + + - Extreme temperature: Extreme temperature. + + - Flood: A general term for the overflow of water from a stream channel onto normally dry land in the floodplain (riverine flooding), higher-than-normal levels along the coast and in lakes or reservoirs (coastal flooding) as well as ponding of water at or near the point where the rain fell (flash floods). + + - Fog: Water droplets that are suspended in the air near the Earth's surface. Fog is simply a cloud that is in contact with the ground. + + - Glacial lake outburst: A flood that occurs when water dammed by a glacier or moraine is suddenly released. Glacial lakes can be at the front of the glacier (marginal lake) or below the ice sheet (sub-glacial lake). + + - Landslide: Any kind of moderate to rapid soil movement incl. lahar, mudslide, debris flow. A landslide is the movement of soil or rock controlled by gravity and the speed of the movement usually ranges between slow and rapid, but not very slow. It can be superficial or deep, but the materials have to make up a mass that is a portion of the slope or the slope itself. The movement has to be downward and outward with a free face. + + - Mass movement: Any type of downslope movement of earth materials. + + - Extreme weather: Storm. + + - Volcanic activity: A type of volcanic event near an opening/vent in the Earth's surface including volcanic eruptions of lava, ash, hot vapour, gas, and pyroclastic material. + + - Wildfire: Any uncontrolled and non-prescribed combustion or burning of plants in a natural setting such as a forest, grassland, brush land or tundra, which consumes the natural fuels and spreads based on environmental conditions (e.g., wind, topography). Wildfires can be triggered by lightning or human actions." dataset: title: Natural disasters @@ -66,24 +103,24 @@ tables: - Uncategorized total_dead_per_100k_people: title: Total number of deaths per 100,000 people - unit: 'cases per 100k people' + unit: 'deaths per 100k people' description_processing: &description-processing-100k | - Disaster-related impacts from EM-DAT have been normalized by Our World in Data to provide data in terms of cases per 100,000 people. + Disaster-related impacts from EM-DAT have been normalized by Our World in Data to provide data in terms of occurrences per 100,000 people. injured_per_100k_people: title: Number of injured persons per 100,000 people - unit: 'cases per 100k people' + unit: 'injured per 100k people' description_processing: *description-processing-100k affected_per_100k_people: title: Number of affected persons per 100,000 people - unit: 'cases per 100k people' + unit: 'affected per 100k people' description_processing: *description-processing-100k homeless_per_100k_people: title: Number of homeless persons per 100,000 people - unit: 'cases per 100k people' + unit: 'homeless per 100k people' description_processing: *description-processing-100k total_affected_per_100k_people: title: Total number of affected persons per 100,000 people - unit: 'cases per 100k people' + unit: 'affected per 100k people' description_processing: *description-processing-100k n_events_per_100k_people: title: Number of events per 100,000 people From 201f28d2fdc1ef7745e8a0bbbdc6eda9b33b7f86 Mon Sep 17 00:00:00 2001 From: Pablo Rosado Date: Tue, 7 Nov 2023 16:24:25 +0100 Subject: [PATCH 2/4] Improve natural disasters metadata --- .../2023-09-20/natural_disasters.xlsx.dvc | 48 ++----------------- 1 file changed, 5 insertions(+), 43 deletions(-) diff --git a/snapshots/emdat/2023-09-20/natural_disasters.xlsx.dvc b/snapshots/emdat/2023-09-20/natural_disasters.xlsx.dvc index 9e07b9a54d5..b6ed1488f8e 100644 --- a/snapshots/emdat/2023-09-20/natural_disasters.xlsx.dvc +++ b/snapshots/emdat/2023-09-20/natural_disasters.xlsx.dvc @@ -2,53 +2,15 @@ meta: origin: title: Natural disasters producer: EM-DAT, CRED / UCLouvain - citation_full: EM-DAT, CRED / UCLouvain, Brussels, Belgium – [www.emdat.be](www.emdat.be). + citation_full: EM-DAT, CRED / UCLouvain, Brussels, Belgium - www.emdat.be (2023). url_main: https://emdat.be/ date_published: 2023-09-20 date_accessed: 2023-09-20 license: - url: https://public.emdat.be/about - name: UCLouvain 2022 - description: | - EM-DAT data includes all categories classified as "natural disasters" (distinguished from technological disasters, such as oil spills and industrial accidents). This includes those from drought, earthquakes, extreme temperatures, extreme weather, floods, fogs, glacial lake outbursts, landslide, dry mass movements, volcanic activity, and wildfires. - - EM-DAT defines the following variables: - - - Affected: People requiring immediate assistance during a period of emergency, i.e. requiring basic survival needs such as food, water, shelter, sanitation and immediate medical assistance. - - - Injured: People suffering from physical injuries, trauma or an illness requiring immediate medical assistance as a direct result of a disaster. - - - Homeless: Number of people whose house is destroyed or heavily damaged and therefore need shelter after an event. - - - Total affected: In EM-DAT, it is the sum of the injured, affected and left homeless after a disaster. - - - Estimated economic damage: The amount of damage to property, crops, and livestock. In EM-DAT estimated damage are given in US$ ('000). For each disaster, the registered figure corresponds to the damage value at the moment of the event, i.e. the figures are shown true to the year of the event. - - - Total deaths: In EM-DAT, it is the sum of deaths and missing. - - EM-DAT defines the following types of disasters: - - - Drought: An extended period of unusually low precipitation that produces a shortage of water for people, animals and plants. Drought is different from most other hazards in that it develops slowly, sometimes even over years, and its onset is generally difficult to detect. Drought is not solely a physical phenomenon because its impacts can be exacerbated by human activities and water supply demands. Drought is therefore often defined both conceptually and operationally. Operational definitions of drought, meaning the degree of precipitation reduction that constitutes a drought, vary by locality, climate and environmental sector. - - - Earthquake: Sudden movement of a block of the Earth's crust along a geological fault and associated ground shaking. - - - Extreme temperature: Extreme temperature. - - - Flood: A general term for the overflow of water from a stream channel onto normally dry land in the floodplain (riverine flooding), higher-than-normal levels along the coast and in lakes or reservoirs (coastal flooding) as well as ponding of water at or near the point where the rain fell (flash floods). - - - Fog: Water droplets that are suspended in the air near the Earth's surface. Fog is simply a cloud that is in contact with the ground. - - - Glacial lake outburst: A flood that occurs when water dammed by a glacier or moraine is suddenly released. Glacial lakes can be at the front of the glacier (marginal lake) or below the ice sheet (sub-glacial lake). - - - Landslide: Any kind of moderate to rapid soil movement incl. lahar, mudslide, debris flow. A landslide is the movement of soil or rock controlled by gravity and the speed of the movement usually ranges between slow and rapid, but not very slow. It can be superficial or deep, but the materials have to make up a mass that is a portion of the slope or the slope itself. The movement has to be downward and outward with a free face. - - - Mass movement: Any type of downslope movement of earth materials. - - - Extreme weather: Storm. - - - Volcanic activity: A type of volcanic event near an opening/vent in the Earth's surface including volcanic eruptions of lava, ash, hot vapour, gas, and pyroclastic material. - - - Wildfire: Any uncontrolled and non-prescribed combustion or burning of plants in a natural setting such as a forest, grassland, brush land or tundra, which consumes the natural fuels and spreads based on environmental conditions (e.g., wind, topography). Wildfires can be triggered by lightning or human actions. + url: https://doc.emdat.be/docs/legal/terms-of-use/ + name: UCLouvain 2023 + description: |- + EM-DAT contains data on the occurrence and impacts of mass disasters worldwide from 1900 to the present day. EM-DAT data includes all categories classified as "natural disasters" (distinguished from technological disasters, such as oil spills and industrial accidents). This includes those from drought, earthquakes, extreme temperatures, extreme weather, floods, fogs, glacial lake outbursts, landslide, dry mass movements, volcanic activity, and wildfires. license: url: https://public.emdat.be/about From d43426ecb3085e6df9379c49e6c1030b9137088e Mon Sep 17 00:00:00 2001 From: Pablo Rosado Date: Tue, 7 Nov 2023 16:25:13 +0100 Subject: [PATCH 3/4] Combine variables description key on operations --- lib/catalog/owid/catalog/variables.py | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/lib/catalog/owid/catalog/variables.py b/lib/catalog/owid/catalog/variables.py index a13052a459d..233f766a6ad 100644 --- a/lib/catalog/owid/catalog/variables.py +++ b/lib/catalog/owid/catalog/variables.py @@ -392,6 +392,13 @@ def get_unique_licenses_from_variables(variables: List[Variable]) -> List[Licens return pd.unique(licenses).tolist() +def get_unique_description_key_points_from_variables(variables: List[Variable]) -> List[str]: + # Make a list of all description key points of all variables. + description_key_points = sum([variable.metadata.description_key for variable in variables], []) + + return pd.unique(description_key_points).tolist() + + def combine_variables_processing_logs(variables: List[Variable]) -> List[Dict[str, Any]]: # Make a list with all entries in the processing log of all variables. processing_log = sum( @@ -487,6 +494,8 @@ def combine_variables_metadata( metadata.description_short = _get_metadata_value_from_variables_if_all_identical( variables=variables_only, field="description_short", operation=operation ) + metadata.description_key = get_unique_description_key_points_from_variables(variables=variables_only) + # TODO: Combine description_processing: If not identical, append one after another. metadata.description_from_producer = _get_metadata_value_from_variables_if_all_identical( variables=variables_only, field="description_from_producer", operation=operation ) From 674a42ff71f396e7282a2bf6a04132eba29ffa9c Mon Sep 17 00:00:00 2001 From: Pablo Rosado Date: Tue, 7 Nov 2023 16:25:21 +0100 Subject: [PATCH 4/4] Add tests --- lib/catalog/tests/conftest.py | 9 +++++ lib/catalog/tests/test_variables.py | 51 +++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/lib/catalog/tests/conftest.py b/lib/catalog/tests/conftest.py index 61520ebf57e..e44ab8a326a 100644 --- a/lib/catalog/tests/conftest.py +++ b/lib/catalog/tests/conftest.py @@ -99,6 +99,10 @@ def table_1(sources, licenses, origins): title="Title of Table 1 Variable a", description="Description of Table 1 Variable a", description_short="Short description of Table 1 Variable a", + description_key=[ + "Key description point 1 of Variable 1", + "Common key description point", + ], description_from_producer="Common description from producer", sources=[sources[2], sources[1]], origins=[origins[2], origins[1]], @@ -111,6 +115,11 @@ def table_1(sources, licenses, origins): title="Title of Table 1 Variable b", description="Description of Table 1 Variable b", description_short="Short description of Table 1 Variable b", + description_key=[ + "Key description point 1 of Variable 2", + "Common key description point", + "Key description point 2 of Variable 2", + ], description_from_producer="Common description from producer", sources=[sources[2], sources[3]], origins=[origins[2], origins[3]], diff --git a/lib/catalog/tests/test_variables.py b/lib/catalog/tests/test_variables.py index a41a512a82c..05ef5e2c8ea 100644 --- a/lib/catalog/tests/test_variables.py +++ b/lib/catalog/tests/test_variables.py @@ -81,6 +81,12 @@ def test_create_new_variable_as_sum_of_other_two(table_1, sources, origins, lice # Since "a" and "b" have different title and description, "c" should have no title or description. assert tb1["c"].metadata.title is None assert tb1["c"].metadata.description is None + assert tb1["c"].metadata.description_key == [ + "Key description point 1 of Variable 1", + "Common key description point", + "Key description point 1 of Variable 2", + "Key description point 2 of Variable 2", + ] assert tb1["c"].metadata.sources == [sources[2], sources[1], sources[3]] assert tb1["c"].metadata.origins == [origins[2], origins[1], origins[3]] assert tb1["c"].metadata.licenses == [licenses[1], licenses[2], licenses[3]] @@ -99,6 +105,10 @@ def test_create_new_variable_as_sum_of_another_variable_plus_a_scalar(table_1) - assert (tb1["d"] == pd.Series([2, 3, 4])).all() assert tb1["d"].metadata.title == table_1["a"].metadata.title assert tb1["d"].metadata.description == table_1["a"].metadata.description + assert tb1["d"].metadata.description_key == [ + "Key description point 1 of Variable 1", + "Common key description point", + ] assert tb1["d"].metadata.sources == table_1["a"].metadata.sources assert tb1["d"].metadata.origins == table_1["a"].metadata.origins assert tb1["d"].metadata.licenses == table_1["a"].metadata.licenses @@ -152,6 +162,12 @@ def test_create_new_variable_as_product_of_other_two(table_1, sources, origins, assert (tb1["e"] == pd.Series([4, 10, 18])).all() assert tb1["e"].metadata.title is None assert tb1["e"].metadata.description is None + assert tb1["e"].metadata.description_key == [ + "Key description point 1 of Variable 1", + "Common key description point", + "Key description point 1 of Variable 2", + "Key description point 2 of Variable 2", + ] assert tb1["e"].metadata.sources == [sources[2], sources[1], sources[3]] assert tb1["e"].metadata.origins == [origins[2], origins[1], origins[3]] assert tb1["e"].metadata.licenses == [licenses[1], licenses[2], licenses[3]] @@ -172,6 +188,12 @@ def test_create_new_variable_as_product_of_other_three(table_1, sources, origins assert (tb1["f"] == pd.Series([20, 70, 162])).all() assert tb1["f"].metadata.title is None assert tb1["f"].metadata.description is None + assert tb1["f"].metadata.description_key == [ + "Key description point 1 of Variable 1", + "Common key description point", + "Key description point 1 of Variable 2", + "Key description point 2 of Variable 2", + ] assert tb1["f"].metadata.sources == [sources[2], sources[1], sources[3]] assert tb1["f"].metadata.origins == [origins[2], origins[1], origins[3]] assert tb1["f"].metadata.licenses == [licenses[1], licenses[2], licenses[3]] @@ -190,6 +212,12 @@ def test_create_new_variable_as_division_of_other_two(table_1, sources, origins, assert (tb1["g"] == pd.Series([0.25, 0.40, 0.50])).all() assert tb1["g"].metadata.title is None assert tb1["g"].metadata.description is None + assert tb1["g"].metadata.description_key == [ + "Key description point 1 of Variable 1", + "Common key description point", + "Key description point 1 of Variable 2", + "Key description point 2 of Variable 2", + ] assert tb1["g"].metadata.sources == [sources[2], sources[1], sources[3]] assert tb1["g"].metadata.origins == [origins[2], origins[1], origins[3]] assert tb1["g"].metadata.licenses == [licenses[1], licenses[2], licenses[3]] @@ -208,6 +236,13 @@ def test_create_new_variable_as_floor_division_of_other_two(table_1, sources, or assert (tb1["h"] == pd.Series([4, 2, 2])).all() assert tb1["h"].metadata.title is None assert tb1["h"].metadata.description is None + # Note that the order of key description points should be first b and then a. + assert tb1["h"].metadata.description_key == [ + "Key description point 1 of Variable 2", + "Common key description point", + "Key description point 2 of Variable 2", + "Key description point 1 of Variable 1", + ] assert tb1["h"].metadata.sources == [sources[2], sources[3], sources[1]] assert tb1["h"].metadata.origins == [origins[2], origins[3], origins[1]] assert tb1["h"].metadata.licenses == [licenses[2], licenses[3], licenses[1]] @@ -226,6 +261,12 @@ def test_create_new_variable_as_module_division_of_other_two(table_1, sources, o assert (tb1["i"] == pd.Series([1, 2, 3])).all() assert tb1["i"].metadata.title is None assert tb1["i"].metadata.description is None + assert tb1["i"].metadata.description_key == [ + "Key description point 1 of Variable 1", + "Common key description point", + "Key description point 1 of Variable 2", + "Key description point 2 of Variable 2", + ] assert tb1["i"].metadata.sources == [sources[2], sources[1], sources[3]] assert tb1["i"].metadata.origins == [origins[2], origins[1], origins[3]] assert tb1["i"].metadata.licenses == [licenses[1], licenses[2], licenses[3]] @@ -244,6 +285,10 @@ def test_create_new_variable_as_another_variable_to_the_power_of_a_scalar(table_ assert (tb1["j"] == pd.Series([1, 4, 9])).all() assert tb1["j"].metadata.title == "Title of Table 1 Variable a" assert tb1["j"].metadata.description == "Description of Table 1 Variable a" + assert tb1["j"].metadata.description_key == [ + "Key description point 1 of Variable 1", + "Common key description point", + ] assert tb1["j"].metadata.sources == [sources[2], sources[1]] assert tb1["j"].metadata.origins == [origins[2], origins[1]] assert tb1["j"].metadata.licenses == [licenses[1]] @@ -261,6 +306,12 @@ def test_create_new_variables_as_another_variable_to_the_power_of_another_variab assert (tb1["k"] == pd.Series([1, 32, 729])).all() assert tb1["k"].metadata.title is None assert tb1["k"].metadata.description is None + assert tb1["k"].metadata.description_key == [ + "Key description point 1 of Variable 1", + "Common key description point", + "Key description point 1 of Variable 2", + "Key description point 2 of Variable 2", + ] assert tb1["k"].metadata.sources == [sources[2], sources[1], sources[3]] assert tb1["k"].metadata.origins == [origins[2], origins[1], origins[3]] assert tb1["k"].metadata.licenses == [licenses[1], licenses[2], licenses[3]]