Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New] Extending support for County level COVID-19 Impact Assessment #851

Closed
AnujTiwari opened this issue Jun 25, 2021 · 55 comments
Closed
Labels
brainstorming Discussion to get creative ideas enhancement New feature or request question Further information is requested

Comments

@AnujTiwari
Copy link

Hi All, Please help me to utilize the CovsirPhy for US County Level COVID-19 Impact Assessment.
My aim is to index all 3142 US counties as per the impact of COVID-19.

I can request the local health department for clinical datasets. Please help me to understand the following points:

  1. What are the minimum parameters required to perform the county-level COVID-19 impact assessment using CovsirPhy.
  2. If I can collect county-level vaccination and variant details, is it possible to utilize them with the current model.
  3. What is the best output parameter (like R0) we can use for comparative evaluation of COIVD-19 impact and indexing counties.

I am very much new to the domain of compartmental modeling so, please help me to understand if I am leaving some important details required to extend the model for the county level.

Thanks

@AnujTiwari AnujTiwari added the enhancement New feature or request label Jun 25, 2021
@lisphilar lisphilar changed the title Extending support for County level COVID-19 Impact Assessment [New] Extending support for County level COVID-19 Impact Assessment Jun 25, 2021
@lisphilar
Copy link
Owner

Thank you for creating issues!

My aim is to index all 3142 US counties as per the impact of COVID-19. I can request the local health department for clinical datasets.

"COVID-19 Hub" (our main data source for JHUData etc.) provides state-level records for US. You can confirm this as follows.

import covsirphy as cs
loader = cs.DataLoader()
jhu_data = loader.jhu()
jhu_data.layer(country="US").Province.unique()

If you have the county-level dataset as a CSV file in your local environment, you can create JHUData instance via CountryData class. Please refer to Use a local CSV file which has the number of cases and regard "county" as "province" when you are using covsirphy library.

  1. What are the minimum parameters required to perform the county-level COVID-19 impact assessment using CovsirPhy.

If I understand this question correctly, you are asking about the required columns of the county-level dataset. You need the following data.

  • observation dates
  • county names
  • the number of confirmed cases
  • the number of fatal cases
  • the number of recovered cases: if you do not have this data, please set 0s. Full complement will be done.
  • population values
  1. If I can collect county-level vaccination and variant details, is it possible to utilize them with the current model.

At the latest version 2.21.0, it is not implemented. We need some updates of classes.

  1. What is the best output parameter (like R0) we can use for comparative evaluation of COIVD-19 impact and indexing counties.

Yes, we can use R0 as the index, but R0 is calculated with parameter values of ODE models. Please refer to Usage: SIR-derived models and I recommend to use the parameter values directly as the index.

@lisphilar lisphilar added brainstorming Discussion to get creative ideas question Further information is requested labels Jun 25, 2021
@AnujTiwari
Copy link
Author

Thank you lisphilar for clearing this up for me. Regarding vaccination and variant details, whether you are working on the updates? If I can include at least the vaccination dataset that will be a great success for me.

@lisphilar
Copy link
Owner

Could you provide the details of the dataset you have? I'd like to write a new class as the handler, testing it with the dataset, if available.

  • column names
  • some rows
  • file extension (CSV?)
  • if opened, URL

I plan to create new class LocalDataLoader tentative name) to handle datasets saved in local environment. Usage may be similar to DataLoaderclass.

@lisphilar
Copy link
Owner

lisphilar commented Jun 27, 2021

Note for updating codes:
When we create LocalDataLoader class, we should add date_format argument because dates are sometimes parsed incorrectly as discussed in #856.

@AnujTiwari
Copy link
Author

Could you provide the details of the dataset you have? I'd like to write a new class as the handler, testing it with the dataset, if available.

  • column names
  • some rows
  • file extension (CSV?)
  • if opened, URL

I plan to create new class LocalDataLoader tentative name) to handle datasets saved in local environment. Usage may be similar to DataLoaderclass.

Hi lisphilar I am working on collecting all the required datasets. Soon update them here for further processing.. Thanks.

@AnujTiwari
Copy link
Author

AnujTiwari commented Jul 4, 2021

Hi lisphilar, I am in the process of getting the dataset from local health departments. I apologize for the long delay. Meanwhile, please let me know how to export the intermediate maps (snl.trend().summary()) in high resolution .tif image format. Thank you

@lisphilar
Copy link
Owner

@AnujTiwari ,
Thank you for letting me know. With some pull requests linked to this issue, I'm preparing to add new features to DataLoader class (LocalDataLoader will not be created).

Meanwhile, please let me know how to export the intermediate maps (snl.trend().summary()) in high resolution .tif image format. Thank you

When we use CovsirPhy with Jupyter Notebook, you can export TIFF files as follows.

snl.interactive = False
snl.trend(filename="image_filename.tiff", dpi=300)

When we use it with Python scripts, snl.interactive = False is un-necessary.

snl.trend() (and all methods which create figures) calls matplotlib.pyplot.savefig() internally. Keyword arguments of matplotlib.pyplot.savefig(), including dpi=300, can be used as the keyword arguments of snl.trend(). If not, that may be a bug.

@AnujTiwari
Copy link
Author

AnujTiwari commented Jul 10, 2021

Hi lisphilar,
As per your suggestions, I have implemented cook county datasets (confirmed, fatal, population and recovered=0).
Everything went smooth and I was capable of generating results for cook county with S-R trend analysis phases. But I have some doubts:

  1. My dataset is starting from 24Jan2020 but my S-R trend first phases was starting from 15Mar2020?
  2. When I am trying to replace S-R trend analysis phases with some new change points following instructions at usage:phases, I am getting an error: ValueError: @start must be the same as/over 15Mar2020, but 27Feb2020 was applied. Am I missing something here?

Again, thanks a lot for your help. I am still waiting for getting health data from local health departments. Keep you updated with the developments, But definitely, I am getting vaccination information (%population received 1st and 2nd dose) and looking forward to incorporating vaccination in the reproduction estimation.

Dataset:
date | province | confirmed | fatal | recovered
1/24/2020 | Cook | 1 | 0 | 0
1/25/2020 | Cook | 1 | 0 | 0
.
.
2/18/2021 | Cook | 467986 | 9775 | 0
2/19/2021 | Cook | 468690 | 9790 | 0
2/20/2021 | Cook | 469379 | 9800 | 0

Code:
country_data = cs.CountryData("cook county.csv", country="Cook")
country_data.set_variables(
date="date", confirmed="confirmed", recovered="recovered", fatal="fatal")

country_data.register_total()
country_data.cleaned()

jhu_data_cook = cs.JHUData.from_dataframe(country_data.cleaned())
population_data_cook = cs.PopulationData()
population_data_cook.update(5150233, country="Cook")

snl = cs.Scenario(country="Cook")
snl.register(jhu_data_cook, population_data_cook)
snl.records()

snl.trend(algo="Pelt-rbf").summary()
snl.estimate(cs.SIRF)

S-R Phases:
0th phase (15Mar2020 - 24Apr2020): finished 281 trials in 0 min 9 sec
1st phase (25Apr2020 - 18May2020): finished 320 trials in 0 min 10 sec
2nd phase (19May2020 - 08Jul2020): finished 385 trials in 0 min 13 sec
3rd phase (09Jul2020 - 17Aug2020): finished 437 trials in 0 min 15 sec
4th phase (18Aug2020 - 23Sep2020): finished 321 trials in 0 min 10 sec
5th phase (24Sep2020 - 20Oct2020): finished 180 trials in 0 min 5 sec
6th phase (21Oct2020 - 04Nov2020): finished 239 trials in 0 min 7 sec
7th phase (05Nov2020 - 14Nov2020): finished 270 trials in 0 min 8 sec
8th phase (15Nov2020 - 24Nov2020): finished 389 trials in 0 min 13 sec
9th phase (25Nov2020 - 05Dec2020): finished 322 trials in 0 min 10 sec
12th phase (05Jan2021 - 20Jan2021): finished 181 trials in 0 min 5 sec
10th phase (06Dec2020 - 17Dec2020): finished 564 trials in 0 min 22 sec
11th phase (18Dec2020 - 04Jan2021): finished 348 trials in 0 min 11 sec
13th phase (21Jan2021 - 20Feb2021): finished 831 trials in 0 min 28 sec

For New Change Points:
snl.clear(include_past=True).summary()
snl.add(start_date="28Apr2020")

Error:
ValueError: @start must be the same as/over 15Mar2020, but 27Feb2020 was applied.

@lisphilar
Copy link
Owner

  1. My dataset is starting from 24Jan2020 but my S-R trend first phases was starting from 15Mar2020?

Yes.
We cannot apply S-R trend analysis to records with R = 0. This is because we cannot calculate parameter "a" of log(S(R)) = -a*R + logN when R is always 0 e.g. (R, S) = (0, 1000), (0, 999), (0, 998)...(0, 980).

So, all records with R = 0 should be ignored when S-R trend analysis. However, zeros in your dataset are not actual values. Because you did not have actual values of R, zeros were registered as I proposed.

When R values appears not to be actual values, covsirphy.Scenario complements R values via covsirphy.JHUDataComplementHandler class internally. We can check complemented R values with snl.records(variables="R"). This returns a dataframe with dates and complemented R values except for zeros. I guess, minimum date of the returned dataframe was 15Mar2020.

  1. When I am trying to replace S-R trend analysis phases with some new change points following instructions at usage:phases, I am getting an error: ValueError: @start must be the same as/over 15Mar2020, but 27Feb2020 was applied. Am I missing something here?

Please replace snl.add(start_date="28Apr2020") with snl.add(end_date="28Apr2020") to specify the end date, not start date, of the new phase.

@AnujTiwari
Copy link
Author

AnujTiwari commented Jul 11, 2021

  1. Yes - Fatal dataset is actually starting from 17th Mar 2020. Thanks for clearing out this to me.
  2. I tried snl.add(end_date="28Apr2020") also but I am receiving the same error.

code:
country_data = cs.CountryData("cook county.csv", country="Cook")
country_data.set_variables(
date="date", confirmed="confirmed", recovered="recovered", fatal="fatal")

country_data.register_total()
country_data.cleaned()

jhu_data_cook = cs.JHUData.from_dataframe(country_data.cleaned())
population_data_cook = cs.PopulationData()
population_data_cook.update(5150233, country="Cook")

snl = cs.Scenario(country="Cook")
snl.register(jhu_data_cook, population_data_cook)
snl.records()

snl.trend(algo="Pelt-rbf").summary()
snl.estimate(cs.SIRF)

snl.clear(include_past=True).summary()
snl.add(end_date="28Apr2020")
ValueError: @start must be the same as/over 15Mar2020, but 27Feb2020 was applied.

@lisphilar
Copy link
Owner

Hmm...This seems an internal error. The first date might not be changed correctly when zeros were removed.

  1. What is the output of snl.first_date?
  2. Does snl.timepoint(first_date="15Mar2020"); snl.add(end_date="28Apr2020") solve the problem?

@AnujTiwari
Copy link
Author

AnujTiwari commented Jul 11, 2021

  1. Output of snl.first_date - 27Feb2020 - I am confused with this date as confirmed cases data is starting from 24th Jan 2020 and fatal cases data is starting from 17th Mar 2020..

  2. snl.timepoints(first_date="15Mar2020"); snl.add(end_date="28Apr2020") is showing the error:
    ValueError: @start must be the same as/over 01Apr2020, but 15Mar2020 was applied.

@AnujTiwari
Copy link
Author

Cook County Data.zip
Please find the data for your reference.

@lisphilar
Copy link
Owner

Thank you for providing the file and I will try to find the cause.

@lisphilar
Copy link
Owner

Could you share the all script via gist or GitHub repository?

@lisphilar
Copy link
Owner

I tried codes with records as-of 18Jul2021.
https://gist.github.com/lisphilar/a4396dc91a25af30ccda0155be2c0214

None value of Rt:
The records in the phases fit to SIR-F model in-sufficiently and kappa was 0 in the two phases and sigma was 0 in the 1st phase. Because Rt is equal to "rho * (1 - theta) / (kappa + sigma)", None was returned. None was converted to "-" with snl.summary() and snl.get().

Error with "snl.estimate_accuracy(): I also confirmed the error when we run it without snl.trend()`. When some parameter values are 0, this error may be raised. Could you create a new issue for this with "Request fixing a bug" template?

@AnujTiwari
Copy link
Author

AnujTiwari commented Jul 20, 2021

In case when the records in the phase fit to SIR-F model in-sufficient and R0 is None '-', what do you suggest for getting some significant R0 value for the county - whether I should increase the time period and chk for 21 days instead of 14 days or something else ?

Also I am a little concerned about the very high R0 values like 13.59 (for Barbour county) after adding a new record (19th Day records in case of Barbour county). It seems like a small increase in COVID-19 cases/fatal (cases increased by count 2 in Barbour county) is causing a major shift in R0 value.

@lisphilar
Copy link
Owner

Because the phase does not fit to the model, I recommend to shorten the phase e.g. 7 days.

However, actually I need the outputs of snl.estimate_accuracy() for confirmation and deciding the number of days. This method compares simulated number with estimated parameter values and actual number of cases.

Thank you for creating #895 and I will investigate it with high priority after work on 21Jul2021 JST.

@AnujTiwari
Copy link
Author

AnujTiwari commented Jul 20, 2021

thank you for the suggestion. Yes- it will be great if it is possible to compute 'minimum number of the days' phase (from the last day) for which records in the phase sufficiently fit to SIR-F model and provide a significant R0 value.. Thanks

@AnujTiwari
Copy link
Author

AnujTiwari commented Jul 20, 2021

there are many counties where the number of cases and deaths is comparatively less model throws an error. I request you to please test the model with the Wilcox county dataset. Similar to many other counties, I am not able to get the outcome for this county.

import pandas as pd
loader = cs.DataLoader(update_interval=None)
loader.read_csv("Wilcox1.csv", parse_dates=["date"], dayfirst=False)

print(loader.local)

loader.assign(country="US", province="Wilcox", population=10373)

print(loader.local)
loader.lock(date="date", country="country", province="province", confirmed="confirmed", fatal="fatal", population="population")
print(loader.locked)

jhu_data = loader.jhu()

snl = cs.Scenario(country="US", province="Wilcox")
snl.register(jhu_data)
snl.records()

snl.clear(include_past=True)
snl.add(end_date="12Jul2021")
snl.add(end_date=snl.today)
snl.summary()

snl.estimate(cs.SIRF)
snl.summary()

ValueError: When the targets have multiple columns, we cannot select RMSLE.

Wilcox1.zip

@lisphilar
Copy link
Owner

Thank you for the notice, but it is difficult to track all problems here. Could you create separate issues for Babour and Wilcox problems?

Here we will discuss adding new methods to DataLoader and documentation of them. I will release a new stable version after closing this issue.

@AnujTiwari
Copy link
Author

Sure, I will create seperate issues for these problems. Thanks

@lisphilar
Copy link
Owner

Dear @AnujTiwari ,
Could you review the updated documentation of the new DataLoader with following links? Version 2.22.0 is planned in Jul2021.
https://lisphilar.github.io/covid19-sir/markdown/LOADING.html

@AnujTiwari
Copy link
Author

Sure I am working on it... Thanks

@lisphilar
Copy link
Owner

Thank you.

FYI.
Some tables are broken there and they may be fixed with a new commit in about twenty minutes (time for deployment).
23b9317

lisphilar added a commit that referenced this issue Jul 28, 2021
lisphilar added a commit that referenced this issue Jul 28, 2021
lisphilar added a commit that referenced this issue Jul 28, 2021
lisphilar added a commit that referenced this issue Jul 28, 2021
lisphilar added a commit that referenced this issue Jul 28, 2021
lisphilar added a commit that referenced this issue Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorming Discussion to get creative ideas enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants