Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting close to zero forecast values in prediction. #3

Open
sarang-kharpate opened this issue May 7, 2021 · 9 comments
Open

Getting close to zero forecast values in prediction. #3

sarang-kharpate opened this issue May 7, 2021 · 9 comments

Comments

@sarang-kharpate
Copy link

Hello Jan-Diederik,

 
I am trying to utilize this model for my NCR region (where total population size is mostly 13 million). I have used your netherlands_april9_narrow.json as a reference and corona_esmda.py as a modelling technique.

In terms of parameters for my NCR regions I have changed the following parameters.

"t_max" : 350,

"population": 13e6,

"alpha" : [[0.1,0.5],[0.1,0.5],[0.6,0.8],[0.1,0.5],[0.1,0.5],[0.1,0.5],[0.1,0.5],[0.1,0.5],[0.1,0.5],[0.1,0.5],[0.1,0.5],[0.8,1.0]],

"dayalpha" : [1, 14, 33, 48, 61, 91, 122, 152, 183, 214, 242, 270],

"xmaxalpha": 280,

All the others numbers were unchanged. When I ran this with above parameters, the forecasted values I mostly got were close to zero for P5,P25,P50,P75,P90 in all the output file (e.g. posterior_prob_hospitalizedcum_calibrated_on_hospitalizedcum, ICU_calibrated_on_hospitalizedcum, infected_calibrated_on_hospitalizedcum etc.)

I have also attached the data and the changed parameter file that I have used for your reference.

Need your help and suggestion on the same and also wanted to know if we want to replicate this for any other region, what changes we need to incorporate in parameters to get a good forecast and what is the rationally behind changing the parameters.
 
 
I have attached my configuration file and input file as well.

NCR_7_May.txt
input_file_7_May.txt
input_file_icufrac_7_May.txt

Thanks

@sarang-kharpate sarang-kharpate changed the title Getting negative forecast values in prediction. Getting close to zero forecast values in prediction. May 7, 2021
@weesjdamv
Copy link
Collaborator

weesjdamv commented May 7, 2021 via email

@weesjdamv
Copy link
Collaborator

weesjdamv commented May 7, 2021 via email

@sarang-kharpate
Copy link
Author

sarang-kharpate commented May 8, 2021

Hi Jan-Diederik,

First of all, thank for going through our problem statement and helping us out in this. Really appreciate it.

Few questions I have

  1. Calibration mode [5] indicates 'infected' or 'hospitalizedcum'?

  2. Can we keep observation errors constant across different calibration modes e.g. 100 in this case.

  3. We have numbers for infected, dead and recovered data starting from March 2020,but hospitals numbers are recorded from July 2020, because of this we decided to take input data from July only, which in turn causing this large numbers in start of data as numbers are getting cumulated from March-July. Can we put 0 for data related to hospitals and ICU for period from March to July? This will keep initial numbers of infected, dead and recovered on lower side for first 90 days of input data period.

  4. Input file columns as per below condition as stated on github page.

  • column 0 - day (number starting from 1)
  • column 1 - cumulative registered infected (positive test cases)
  • column 2 - cum dead
  • column 3 - cum recovered
  • column 4 - cumulative hospitalized
  • column 5 - actual IC units used (may be estimated or 0)
  • column 6 - actual hospitalized (put all to 0 to overwrite from estimates calculated from the hospital flow model)

It will be really helpful if you can guide us as to what should be our input columns be exactly like.

@weesjdamv
Copy link
Collaborator

weesjdamv commented May 8, 2021 via email

@sarang-kharpate
Copy link
Author

Hello Jan-Diederik,

Thanks for the guidance. With your help in tuning parameter files along with input file changes we are able to get pretty good fit and projection for Hospitalized and ICU values. The code is mimicking the trend pretty closely. I have attached new input file and latest configuration files. I do have few more questions for you

  1. We are still getting very high values for infected predictions( P0.095 and observed for file NCR_10_May_best_posterior_prob_infected_calibrated_on_[5, 6].csv),
    and when we try to optimize parameter for infection related changes ( e.g. m, R0, sigma, gamma values) our hospital prediction also get changes.

  2. When we try to use calibration mode as [1, 5, 6] code errors out like

(numpy.linalg.LinAlgError: Singular matrix error)

even if we use [1] only it still gets errors out. But it runs smoothly for [5, 6] calibration mode. Am I putting something wrong in any parameters?

  1. This code generates .h5 file in output folder. If I explore .h5 file I can see model and posterior sub sections in h5 file. I think columns are forecasted values for time, susceptible, exposed, infected, remove, hospital, hospital cumulative, ICU, ICU cumulative, recovered ,dead and alpha run. All predicted by model.
    Is my interpretation correct for 13 columns present in file?
df = pd.DataFrame(hf['model']['posterior'][0]).T
cols = ['O_TIME' ,'O_SUS' ,'O_EXP' ,'O_INF' ,'O_REM' ,'O_HOS' ,'O_HOSCUM' ,'O_ICU' ,'O_ICUCUM' ,'O_REC' ,'O_DEAD' ,'O_CUMINF' ,'O_ALPHARUN']
df.columns = cols

Thanks

NCR_10_May_best.txt
input_file_10_May_july_onwards_final.txt

@weesjdamv
Copy link
Collaborator

weesjdamv commented May 11, 2021 via email

@weesjdamv
Copy link
Collaborator

weesjdamv commented May 12, 2021 via email

@sarang-kharpate
Copy link
Author

Hello Jan-Diederik,

Thanks a lot for you quick and prompt reply, because of this we have progressed so much in our work. I am attaching our final inputs and configuration files for which we are getting best fits. We have finalized on 3 configuration files (infections, Hospitalized and ICU).

I just want to understand one logic. When we calculate values for writing back to csv files (0.05, 0.25,...0.95) we sort the values and then take (length of file * confidence interval) and then take that index position values from array and write it back.

Basically I just want to understand below logic.

for post_day in posterior_curves[t_ind, :]:
    array_sorted = np.sort(post_day)
    p_array.append([array_sorted[int(posterior_length * p)] for p in p_values])

Definitely there must be some thought behind this step, I am not able to understand it. Also what if I take last value of last iteration array? Last value of posterior as final my final output.

Thanks and Regards,
Sarang Kharpate

input_file_13_May.txt
Best_Infected.txt
best_hospital.txt
Best_ICU.txt

@weesjdamv
Copy link
Collaborator

weesjdamv commented May 14, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants