-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCNN training on AWS #10
Comments
This is a Docker container and you could run it on AWS the same as any other container. The landing page has a complete description of how to format your data for training and validation. Any images will do. |
Thanks for the info. I tried to run the model on AWS and test it on our data. This is the error I am getting when using on our test set.Since I cannot access the model_test.py script I am unable to debug it. File "./model_test.py", line 989, in Iiii1IiIi File "./model_test.py", line 978, in oOo0OooOo File "./model_test.py", line 960, in OOoOO0OO File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 386, in join File "./model_test.py", line 75, in I1I11I1I1I ZeroDivisionError: float division by zero` |
Divide by zero errors are almost always caused by having a non-orderable batch (containing all right-censored samples). |
Hi I do have mix of right and uncensored samples. Still I get the same error. |
We have no provisions for handling right-censored data. There is a single variable to indicate left-censored status, and you need to have at least one non left-censored sample in each batch. What is the event frequency in your dataset? You may have uncensored samples in the dataset at large, but as these are batch you can end up with non-orderable batches. This could be a problem if the event frequency is low (think binomial distribution). We never encountered this in our applications, but we may have to add some logic if that's the case. |
So.We have a very small test sets about say 7 samples and almost mix of uncensored and censored patients. For example one test set has about 4-Dead and 3-Uncensored. Similar in other test sets there are different event frequencies. Is there a way to handle it ? |
I can't tell from that information. You have a dataset with n samples and a given event frequency (p), and then samples from this dataset are randomly assigned to batches of k samples during training. The probability of getting a batch with all uncensored samples is a bernoulli trial with n, k, p. So depending on your batch size and event frequency it is sometimes possible to get a batch where the loss function is not defined. |
Thats so helpful. !! Is there way to avoid this ? Thanks much for prompt response |
Try increasing the batch size. We will add a check to prevent this when we fix the Docker issue. |
Hi Dr Cooper, Increasing the batch size dint work either. |
This is the first time you mentioned testing. In order for us to help diagnose the issue we're going to need a very precise description of what you are trying to accomplish and what functions you are calling. |
Thanks much for prompt response !! This was the error I posted above and I was working on model_test.py script to test on our data. |
This is unrelated to orderability since it is happening during inference. We will resolve the issue when we update the new docker container. |
We traced the error and it is still related to calculation of the c-index during inference. Can you be sure that you are calling this with at least one uncensored sample? Did you format your censoring variable as directed in the examples? |
Yes I have tried that with censoring variable as in the examples and it results in same error. |
I haven't forgotten about this. We're going to redo the Docker image to address the NVIDIA errors and at that time I will add some exception handling to avoid these conditions. It will be a while since I recently switched jobs and am dealing with a lot at the moment. |
Hi ,
Is there a possibility to train SCNN on AWS. ? If yes, what are the requirements for the process. Also how do we adapt the code to test on our dataset. What are the requirements for data suitable for the model.
Thanks much !
The text was updated successfully, but these errors were encountered: