Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 antibiotics: adding aggregated antimicrobial use data by class #3629

Merged
merged 31 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
88d3c41
adding an aggregate table
spoonerf Nov 26, 2024
ea75203
add grapher
spoonerf Nov 26, 2024
103e9d3
typ
spoonerf Nov 26, 2024
e5b3986
forgot to add table
spoonerf Nov 26, 2024
e1295c5
trying out description processing
spoonerf Nov 26, 2024
6d80645
combine anti tb into antibacs
spoonerf Nov 26, 2024
3a31c50
adding description processing to index
spoonerf Nov 27, 2024
0fcd451
one desc processing per index
spoonerf Nov 27, 2024
534aea8
sort out origins
spoonerf Nov 27, 2024
d492082
sorting out description processing
spoonerf Nov 27, 2024
195d1b0
Merge branch 'master' into amu-agg
spoonerf Nov 27, 2024
f72744c
sorting out description processing
spoonerf Nov 27, 2024
7ac048f
removing antitb drugs
spoonerf Nov 27, 2024
3813f8d
deep copy of tb_class to prevent change of antimicrobial class
spoonerf Nov 27, 2024
5799850
add back in tb, separately
spoonerf Nov 27, 2024
b122242
add back missing countries
spoonerf Nov 27, 2024
a223bfa
sorting out the description processing
spoonerf Nov 27, 2024
6d5f724
pivoting agg table
spoonerf Nov 27, 2024
b4daf55
fixing metadata
spoonerf Nov 27, 2024
e873676
try and improve the formatting of description processing
spoonerf Nov 27, 2024
ab8a6e3
pablo's suggestions
spoonerf Nov 29, 2024
09fa0fa
description changes
spoonerf Nov 29, 2024
8d519f6
remove unused pandas
spoonerf Nov 29, 2024
a5d8eb5
change to description_key
spoonerf Nov 29, 2024
3fea12d
back to dp
spoonerf Nov 29, 2024
e1f1449
try out desc key
spoonerf Nov 29, 2024
34d5252
try out desc key
spoonerf Nov 29, 2024
a45a2f0
try out desc key
spoonerf Nov 29, 2024
731ba81
try out desc key
spoonerf Nov 29, 2024
d05d93c
tidy up the notes column
spoonerf Nov 29, 2024
1716e6f
don't forget decimal places
spoonerf Nov 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ definitions:
presentation:
topic_tags:
- Global Health
aware_description:
<% if aware == "A" %>
aware_description: <% if aware == "A" %>
Access antibiotics have activity against a wide range of common pathogens and show lower resistance potential than antibiotics in the other groups.
<% elif aware == "W" %>
Watch antibiotic have higher resistance potential and include most of the highest priority agents among the Critically Important Antimicrobials for Human Medicine and/or antibiotics that are at relatively high risk of bacterial resistance.
Expand All @@ -14,8 +13,7 @@ definitions:
<% elif aware == "O" %>
The use of the Not classified/Not recommended antibiotics is not evidence-based, nor recommended in high-quality international guidelines. WHO does not recommend the use of these antibiotics in clinical practice.
<% endif %>
routeofadministration:
<% if routeofadministration == "O" %>
routeofadministration: <% if routeofadministration == "O" %>
orally administered
<% elif routeofadministration == "P" %>
parentearally administered
Expand All @@ -24,36 +22,84 @@ definitions:
<% elif routeofadministration == "I" %>
inhaled
<% endif %>

antimicrobialclass:
<% if antimicrobialclass == "Antibacterials (ATC J01, A07AA, P01AB, ATC J04A)" %>
antibiotics including antituberculosis drugs
<% elif antimicrobialclass == "Antimalarials (ATC P01B)" %>
antimalarials
<% elif antimicrobialclass == "Antimycotics and antifungals for systemic use (J02, D01B)" %>
antifungals
<% elif antimicrobialclass == "Antivirals for systemic use (ATC J05)" %>
antivirals
<% elif antimicrobialclass == "Drugs for the treatment of tuberculosis (ATC J04A)" %>
antituberculosis drugs
<% endif %>

# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/
dataset:
update_period_days: 308

update_period_days: 365

tables:
class:
variables:
ddd:
title: Defined daily doses of {definitions.routeofadministration} << antimicrobialclass>> - << atc4name.lower() >> used
description_short: Volume of antimicrobials used in a given year.
#description_processing: <<notes>>
title: Defined daily doses of {definitions.routeofadministration} << antimicrobialclass.lower()>> - << atc4name.lower() >> used
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antimicrobials used in a given year.
unit: defined daily doses
did:
title: Defined daily doses per 1000 inhabitants per day of {definitions.routeofadministration} << antimicrobialclass>> - << atc4name.lower() >> used
description_short: Volume of antimicrobials used per 1000 inhabitants per day.
#description_processing: <<notes>>
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antimicrobials used per 1000 inhabitants per day.
unit: defined daily doses per 1000 inhabitants per day
class_aggregated:
variables:
ddd_anti_malarials:
title: Defined daily doses of antimalarials used
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antimalarials used in a given year.
unit: defined daily doses
ddd_antibacterials_and_antituberculosis:
title: Defined daily doses of antibiotics and antituberculosis drugs used
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antibiotics and antituberculosis drugs used in a given year.
unit: defined daily doses
ddd_antifungals:
title: Defined daily doses of antifungals used
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antifungals used in a given year.
unit: defined daily doses
ddd_antituberculosis:
title: Defined daily doses of antituberculosis drugs used
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antituberculosis drugs used in a given year.
unit: defined daily doses
ddd_antivirals:
title: Defined daily doses of antivirals used
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antivirals used in a given year.
unit: defined daily doses
did_anti_malarials:
title: Defined daily doses of antimalarials used per 1,000 inhabitants per day
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antimalarials used in a given year per 1,000 inhabitants per day.
unit: defined daily doses per 1,000 inhabitants per day
did_antibacterials_and_antituberculosis:
title: Defined daily doses of antibiotics and antituberculosis drugs used per 1,000 inhabitants per day
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antibiotics and antituberculosis drugs used in a given year per 1,000 inhabitants per day.
unit: defined daily doses per 1,000 inhabitants per day
did_antifungals:
title: Defined daily doses of antifungals used per 1,000 inhabitants per day
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antifungals used in a given year per 1,000 inhabitants per day.
unit: defined daily doses per 1,000 inhabitants per day
did_antituberculosis:
title: Defined daily doses of antituberculosis drugs used per 1,000 inhabitants per day
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antituberculosis drugs used in a given year per 1,000 inhabitants per day.
unit: defined daily doses per 1,000 inhabitants per day
did_antivirals:
title: Defined daily doses of antivirals used per 1,000 inhabitants per day
description_short: Total [defined daily doses](#dod:defined-daily-doses) of antivirals used in a given year per 1,000 inhabitants per day.
unit: defined daily doses per 1,000 inhabitants per day
aware:
variables:
ddd:
title: Defined daily doses of << awarelabel >> antibiotics used
description_short: "Volume of AWaRe category: << awarelabel >> antibiotics used in a given year. {definitions.aware_description}"
#description_processing: <<notes>>
description_short: "Total [defined daily doses](#dod:defined-daily-doses) of AWaRe category: << awarelabel >> antibiotics used in a given year. {definitions.aware_description}"
unit: defined daily doses
did:
title: Defined daily doses per 1000 inhabitants per day of << awarelabel>> antibiotics used
description_short: "Volume of AWaRe category: <<awarelabel>> used per 1000 inhabitants per day. {definitions.aware_description}"
#description_processing: <<notes>>
description_short: "Total [defined daily doses](#dod:defined-daily-doses) of AWaRe category: <<awarelabel>> used per 1000 inhabitants per day. {definitions.aware_description}"
unit: defined daily doses per 1000 inhabitants per day
123 changes: 122 additions & 1 deletion etl/steps/data/garden/antibiotics/2024-11-12/antimicrobial_usage.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
"""Load a meadow dataset and create a garden dataset."""

import pandas as pd
from owid.catalog import Table
from owid.catalog import processing as pr

from etl.data_helpers import geo
from etl.helpers import PathFinder, create_dataset

Expand All @@ -23,6 +27,9 @@ def run(dest_dir: str) -> None:
tb_class = geo.harmonize_countries(df=tb_class, countries_file=paths.country_mapping_path)
tb_aware = geo.harmonize_countries(df=tb_aware, countries_file=paths.country_mapping_path)

# Aggregate by antimicrobial class
tb_class_agg, tb_notes = aggregate_antimicrobial_classes(tb_class)
# Save the origins of the aggregated table to insert back in later
# Drop columns that are not needed in the garden dataset.
tb_class = tb_class.drop(
columns=["whoregioncode", "whoregionname", "countryiso3", "incomeworldbankjune", "atc4", "notes"]
Expand All @@ -31,14 +38,128 @@ def run(dest_dir: str) -> None:

tb_class = tb_class.format(["country", "year", "antimicrobialclass", "atc4name", "routeofadministration"])
tb_aware = tb_aware.format(["country", "year", "awarelabel"])
tb_class_agg = pivot_aggregated_table(tb_class_agg, tb_notes)
tb_class_agg = tb_class_agg.format(["country", "year"], short_name="class_aggregated")

#
# Save outputs.
#
# Create a new garden dataset with the same metadata as the meadow dataset.
ds_garden = create_dataset(
dest_dir, tables=[tb_class, tb_aware], check_variables_metadata=True, default_metadata=ds_meadow.metadata
dest_dir,
tables=[tb_class, tb_aware, tb_class_agg],
check_variables_metadata=True,
default_metadata=ds_meadow.metadata,
)

# Save changes in the new garden dataset.
ds_garden.save()


def pivot_aggregated_table(tb_class_agg: Table, tb_notes: Table) -> Table:
"""
Pivot the aggregated table to have a column for each antimicrobial class, then add the description_processing metadata
"""

tb_notes_dict = {
"Antibacterials (ATC J01, A07AA, P01AB)": "antibacterials",
"Antimalarials (ATC P01B)": "anti_malarials",
"Antimycotics and antifungals for systemic use (J02, D01B)": "antifungals",
"Antivirals for systemic use (ATC J05)": "antivirals",
"Drugs for the treatment of tuberculosis (ATC J04A)": "antituberculosis",
"Antibacterials (ATC J01, A07AA, P01AB, ATC J04A)": "antibacterials_and_antituberculosis",
}
tb_notes["category"] = tb_notes["antimicrobialclass"].map(tb_notes_dict)
tb_class_agg = tb_class_agg.copy(deep=True)
tb_class_agg["antimicrobialclass"] = tb_class_agg["antimicrobialclass"].replace(tb_notes_dict)
tb_class_agg = tb_class_agg.pivot(index=["country", "year"], columns="antimicrobialclass", values=["ddd", "did"])
tb_class_agg.columns = tb_class_agg.columns.to_flat_index()
tb_class_agg.columns = [f"{col[0]}_{col[1]}" for col in tb_class_agg.columns]
tb_class_agg = tb_class_agg.reset_index()
spoonerf marked this conversation as resolved.
Show resolved Hide resolved

for key in tb_notes_dict.values():
if f"ddd_{key}" in tb_class_agg.columns:
tb_class_agg[f"ddd_{key}"].metadata.description_processing = tb_notes["description_processing"][
tb_notes["category"] == key
]
if f"did_{key}" in tb_class_agg.columns:
tb_class_agg[f"did_{key}"].metadata.description_processing = tb_notes["description_processing"][
tb_notes["category"] == key
].astype(str)
spoonerf marked this conversation as resolved.
Show resolved Hide resolved

return tb_class_agg


def aggregate_antimicrobial_classes(tb: Table) -> Table:
"""
Aggregating by antimicrobial class, we want to combine antibacterials and antituberculosis, but also keep antituberculosis separately
"""
tb = tb.copy(deep=True)
# Convert the column to strings (if not already done)
tb["antimicrobialclass"] = tb["antimicrobialclass"].astype(str)
spoonerf marked this conversation as resolved.
Show resolved Hide resolved

# Create a completely independent copy of antituberculosis rows and reset its index
msk = tb["antimicrobialclass"] == "Drugs for the treatment of tuberculosis (ATC J04A)"
tb_anti_tb = tb[msk].reset_index(drop=True)
assert len(tb_anti_tb["antimicrobialclass"].unique()) == 1

# Modify antimicrobialclass in tb
tb["antimicrobialclass"] = tb["antimicrobialclass"].replace(
{
"Drugs for the treatment of tuberculosis (ATC J04A)": "Antibacterials (ATC J01, A07AA, P01AB, ATC J04A)",
"Antibacterials (ATC J01, A07AA, P01AB)": "Antibacterials (ATC J01, A07AA, P01AB, ATC J04A)",
},
)
assert len(tb["antimicrobialclass"].unique()) == 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be safer to assert the expected unique items in "antimicrobialclass" at the beginning of the function, and then rename them.

# Format the notes tables before it's removed
tb_notes = tb[["country", "year", "antimicrobialclass", "notes"]].dropna(subset=["notes"])
tb_notes = format_notes(tb_notes)

# Aggregate the data
tb = tb.groupby(["country", "year", "antimicrobialclass"], dropna=False)[["ddd", "did"]].sum().reset_index()
assert len(tb["antimicrobialclass"].unique()) == 4
# Add the antituberculosis data back to tb
tb_anti_tb = (
tb_anti_tb.groupby(["country", "year", "antimicrobialclass"], dropna=False)[["ddd", "did"]].sum().reset_index()
)
tb_combined = pr.concat([tb, tb_anti_tb])

tb_combined.set_index(["country", "year", "antimicrobialclass"], verify_integrity=True)

return tb_combined, tb_notes


def format_notes(tb_notes: Table) -> Table:
"""
Format notes column
"""
for note in tb_notes["notes"].unique():
if pd.notna(note):
spoonerf marked this conversation as resolved.
Show resolved Hide resolved
msk = tb_notes["notes"] == note
tb_note = tb_notes[msk]
countries = tb_note["country"].unique()
countries_formatted = combine_countries(countries)
description_processing_string = f"- In {countries_formatted}: {note}\n"
tb_notes.loc[msk, "description_processing"] = description_processing_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting way to create description processing! In principle, that field should be dedicated to our own processing (i.e. things OWID has done to the data), not to how the data itself was processed. See that, on data pages, we say "Notes on our processing step for this indicator". I understand that these notes you are loading refer to the original processing by the data provider, right?
Maybe they should be elsewhere (key description points, if they are important?). But if you think they make more sense as description_processing, you can keep them this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, I guess they might more sense as description_key?

# Creating onedescription processing for each antimicrobial class, the variable unit
tb_desc = (
tb_notes.dropna(subset=["description_processing"]) # Remove NaNs
.groupby(["antimicrobialclass"])["description_processing"]
.apply(lambda x: "; ".join(set(x))) # Combine unique values
.reset_index()
)
# tb = pr.merge(tb, tb_desc, on=["antimicrobialclass"])

return tb_desc


def combine_countries(countries):
# Combine countries into a string
if not countries:
return ""
elif len(countries) == 1:
return countries[0]
elif len(countries) == 2:
return " and ".join(countries)
else:
return ", ".join(countries[:-1]) + " and " + countries[-1]
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ def run(dest_dir: str) -> None:
# Read table from garden dataset.
tb_class = ds_garden["class"]
tb_aware = ds_garden["aware"]
tb_class_agg = ds_garden["class_aggregated"]

#
# Process data.
Expand All @@ -26,7 +27,10 @@ def run(dest_dir: str) -> None:
#
# Create a new grapher dataset with the same metadata as the garden dataset.
ds_grapher = create_dataset(
dest_dir, tables=[tb_class, tb_aware], check_variables_metadata=True, default_metadata=ds_garden.metadata
dest_dir,
tables=[tb_class, tb_aware, tb_class_agg],
check_variables_metadata=True,
default_metadata=ds_garden.metadata,
)

# Save changes in the new grapher dataset.
Expand Down