SCNN training on AWS #10

pranathichunduru · 2019-06-10T18:19:21Z

Hi ,
Is there a possibility to train SCNN on AWS. ? If yes, what are the requirements for the process. Also how do we adapt the code to test on our dataset. What are the requirements for data suitable for the model.

Thanks much !

cooperlab · 2019-06-10T22:25:02Z

This is a Docker container and you could run it on AWS the same as any other container. The landing page has a complete description of how to format your data for training and validation. Any images will do.

pranathichunduru · 2019-06-13T02:43:51Z

Thanks for the info. I tried to run the model on AWS and test it on our data. This is the error I am getting when using on our test set.Since I cannot access the model_test.py script I am unable to debug it.
`Testing model: 1
Test batch: 1
Traceback (most recent call last):
File "./model_test.py", line 1003, in

File "./model_test.py", line 989, in Iiii1IiIi

File "./model_test.py", line 978, in oOo0OooOo

File "./model_test.py", line 960, in OOoOO0OO

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 386, in join
six.reraise(*self._exc_info_to_raise)
File "./model_test.py", line 947, in OOoOO0OO

File "./model_test.py", line 75, in I1I11I1I1I

ZeroDivisionError: float division by zero`

cooperlab · 2019-06-13T04:09:34Z

Divide by zero errors are almost always caused by having a non-orderable batch (containing all right-censored samples).

pranathichunduru · 2019-06-20T17:32:21Z

Hi I do have mix of right and uncensored samples. Still I get the same error.

cooperlab · 2019-06-20T17:36:19Z

We have no provisions for handling right-censored data. There is a single variable to indicate left-censored status, and you need to have at least one non left-censored sample in each batch.

What is the event frequency in your dataset? You may have uncensored samples in the dataset at large, but as these are batch you can end up with non-orderable batches. This could be a problem if the event frequency is low (think binomial distribution). We never encountered this in our applications, but we may have to add some logic if that's the case.

pranathichunduru · 2019-06-20T17:45:15Z

So.We have a very small test sets about say 7 samples and almost mix of uncensored and censored patients. For example one test set has about 4-Dead and 3-Uncensored. Similar in other test sets there are different event frequencies. Is there a way to handle it ?

cooperlab · 2019-06-20T17:51:59Z

I can't tell from that information.

You have a dataset with n samples and a given event frequency (p), and then samples from this dataset are randomly assigned to batches of k samples during training. The probability of getting a batch with all uncensored samples is a bernoulli trial with n, k, p. So depending on your batch size and event frequency it is sometimes possible to get a batch where the loss function is not defined.

pranathichunduru · 2019-06-20T18:34:23Z

Thats so helpful. !! Is there way to avoid this ? Thanks much for prompt response

cooperlab · 2019-06-20T19:11:25Z

Try increasing the batch size.

We will add a check to prevent this when we fix the Docker issue.

pranathichunduru · 2019-06-20T21:22:08Z

Hi Dr Cooper, Increasing the batch size dint work either.
So here is the event frequency table of our test sets.
test set 1: Event(1) = 4, Censored(0) = 3
test set 2: Event(1) = 2, Censored(0) = 5
test set 3: Event(1) = 5, Censored(0) = 3
test set 4: Event(1) = 6, Censored(0) = 2
test set 5: Event(1) = 3, Censored(0) = 4
Also I would like to add that I am getting this error during model testing and not model training.

cooperlab · 2019-06-21T18:43:59Z

This is the first time you mentioned testing. In order for us to help diagnose the issue we're going to need a very precise description of what you are trying to accomplish and what functions you are calling.

pranathichunduru · 2019-06-21T18:47:03Z

Thanks for the info. I tried to run the model on AWS and test it on our data. This is the error I am getting when using on our test set.Since I cannot access the model_test.py script I am unable to debug it.
`Testing model: 1
Test batch: 1
Traceback (most recent call last):
File "./model_test.py", line 1003, in

File "./model_test.py", line 989, in Iiii1IiIi

File "./model_test.py", line 978, in oOo0OooOo

File "./model_test.py", line 960, in OOoOO0OO

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 386, in join
six.reraise(*self._exc_info_to_raise)
File "./model_test.py", line 947, in OOoOO0OO

File "./model_test.py", line 75, in I1I11I1I1I

ZeroDivisionError: float division by zero`

Thanks much for prompt response !! This was the error I posted above and I was working on model_test.py script to test on our data.

cooperlab · 2019-06-21T18:55:50Z

This is unrelated to orderability since it is happening during inference. We will resolve the issue when we update the new docker container.

cooperlab · 2019-06-21T19:52:50Z

We traced the error and it is still related to calculation of the c-index during inference. Can you be sure that you are calling this with at least one uncensored sample? Did you format your censoring variable as directed in the examples?

pranathichunduru · 2019-08-13T22:05:30Z

Yes I have tried that with censoring variable as in the examples and it results in same error.

cooperlab · 2019-08-13T22:17:30Z

I haven't forgotten about this. We're going to redo the Docker image to address the NVIDIA errors and at that time I will add some exception handling to avoid these conditions. It will be a while since I recently switched jobs and am dealing with a lot at the moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCNN training on AWS #10

SCNN training on AWS #10

pranathichunduru commented Jun 10, 2019

cooperlab commented Jun 10, 2019

pranathichunduru commented Jun 13, 2019

cooperlab commented Jun 13, 2019

pranathichunduru commented Jun 20, 2019 •

edited

Loading

cooperlab commented Jun 20, 2019

pranathichunduru commented Jun 20, 2019

cooperlab commented Jun 20, 2019

pranathichunduru commented Jun 20, 2019

cooperlab commented Jun 20, 2019

pranathichunduru commented Jun 20, 2019 •

edited

Loading

cooperlab commented Jun 21, 2019

pranathichunduru commented Jun 21, 2019

cooperlab commented Jun 21, 2019

cooperlab commented Jun 21, 2019

pranathichunduru commented Aug 13, 2019

cooperlab commented Aug 13, 2019

SCNN training on AWS #10

SCNN training on AWS #10

Comments

pranathichunduru commented Jun 10, 2019

cooperlab commented Jun 10, 2019

pranathichunduru commented Jun 13, 2019

cooperlab commented Jun 13, 2019

pranathichunduru commented Jun 20, 2019 • edited Loading

cooperlab commented Jun 20, 2019

pranathichunduru commented Jun 20, 2019

cooperlab commented Jun 20, 2019

pranathichunduru commented Jun 20, 2019

cooperlab commented Jun 20, 2019

pranathichunduru commented Jun 20, 2019 • edited Loading

cooperlab commented Jun 21, 2019

pranathichunduru commented Jun 21, 2019

cooperlab commented Jun 21, 2019

cooperlab commented Jun 21, 2019

pranathichunduru commented Aug 13, 2019

cooperlab commented Aug 13, 2019

pranathichunduru commented Jun 20, 2019 •

edited

Loading

pranathichunduru commented Jun 20, 2019 •

edited

Loading