Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 WHO/Global Polio Eradication Initiative polio datasets #2502

Merged
merged 39 commits into from
Apr 18, 2024
Merged

Conversation

spoonerf
Copy link
Contributor

@spoonerf spoonerf commented Apr 9, 2024

Hello!

This is a little bit of a monster, so I think it makes sense to split the review into two, by the namespaces I used.

@paarriagadap would you be able to review the following:

  • dag/health.yml
  • snapshots/health/2024-04-12/
  • meadow/health/2024-04-12/
  • garden/health/2024-04-12/
  • grapher/health/2024-04-12/

@pabloarosado you be able to review:

  • snapshots/who/2024-04-08/
  • snapshots/who/2024-04-09/
  • meadow/who/2024-04-08/
  • meadow/who/2024-04-09/
  • garden/who/2024-04-08/
  • garden/who/2024-04-09/
  • grapher/who/2024-04-08/

@spoonerf spoonerf changed the title 📊 WHO Polio AFP dataset 📊 WHO/GPEI polio datasets Apr 9, 2024
@spoonerf spoonerf closed this Apr 10, 2024
@spoonerf spoonerf reopened this Apr 10, 2024
@owidbot
Copy link
Contributor

owidbot commented Apr 11, 2024

Staging server:
etl diff: ✅ No differences found
+ Dataset garden/health/2024-04-12/polio_free_countries
+ + Table polio_free_countries
+   + Column latest_year_wild_polio_case
+   + Column status
+ Dataset garden/who/2024-04-08/polio
+ + Table polio
+   + Column total_cases
+   + Column afp_cases
+   + Column non_polio_afp_rate
+   + Column pct_adequate_stool_collection
+   + Column pending
+   + Column wild_poliovirus_cases
+   + Column cvdpv_cases
+   + Column compatibles
+   + Column footnote
+   + Column cvdpv1
+   + Column cvdpv2
+   + Column cvdpv3
+   + Column correction_factor
+   + Column estimated_cases
+   + Column polio_surveillance_status
+   + Column afp_cases_per_million
+   + Column wild_poliovirus_cases_per_million
+   + Column cvdpv_cases_per_million
+   + Column total_cases_per_million
+   + Column estimated_cases_per_million
+   + Column cvdpv1_per_million
+   + Column cvdpv2_per_million
+   + Column cvdpv3_per_million
+ Dataset garden/who/2024-04-09/polio_historical
+ + Table polio_historical
+   + Column cases


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile are not included

Edited: 2024-04-17 13:42:27 UTC
Execution time: 38.77 seconds

@spoonerf spoonerf marked this pull request as ready for review April 17, 2024 13:32
@spoonerf spoonerf changed the title 📊 WHO/GPEI polio datasets 📊 WHO/Global Polio Eradication Initiative polio datasets Apr 17, 2024
Copy link
Contributor

@paarriagadap paarriagadap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good! I added several minor comments. Sorry for not sending the comments in one go, the pull request extension on VSCode is not working properly!

@spoonerf
Copy link
Contributor Author

Very good! I added several minor comments. Sorry for not sending the comments in one go, the pull request extension on VSCode is not working properly!

Thanks for the brilliant review! :)

Copy link
Contributor

@pabloarosado pabloarosado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I just suggested a bunch of small things, feel free to ignore.

etl/steps/data/meadow/who/2024-04-09/polio_historical.py Outdated Show resolved Hide resolved
etl/steps/data/meadow/who/2024-04-09/polio_historical.py Outdated Show resolved Hide resolved
snapshots/health/2024-04-12/polio_free_countries.csv.dvc Outdated Show resolved Hide resolved
snapshots/health/2024-04-12/polio_free_countries.csv.dvc Outdated Show resolved Hide resolved
snapshots/who/2024-04-08/polio_afp.csv.dvc Outdated Show resolved Hide resolved
etl/steps/data/garden/who/2024-04-08/polio.py Outdated Show resolved Hide resolved
"""
Some values for "Adequate stool collection" are over 100%, we should set these to NA.
"""
tb.loc[tb["pct_adequate_stool_collection"] > 100, "pct_adequate_stool_collection"] = pd.NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to happen only in three cases, with values 102, 104 and 113. I haven't looked into it, but I'm wondering if it would be better to set 102 and 104 to 100, and consider it a numerical issue with rounded numbers (and 113 to nan, as it possibly is an issue). Up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check with Saloni!

etl/steps/data/garden/who/2024-04-08/polio.py Outdated Show resolved Hide resolved
etl/steps/data/meadow/who/2024-04-08/polio_afp.py Outdated Show resolved Hide resolved
etl/steps/data/garden/who/2024-04-09/polio_historical.py Outdated Show resolved Hide resolved
@spoonerf
Copy link
Contributor Author

Looks good! I just suggested a bunch of small things, feel free to ignore.

Thank you very much!!

@spoonerf spoonerf merged commit 59ba9d7 into master Apr 18, 2024
8 of 9 checks passed
@spoonerf spoonerf deleted the polio branch April 18, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants