-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Understanding a poor training on the ADAM dataset #21
Comments
Hi @AaronSaam , thank you for trying nnDetection :) That sounds strange and didn't happen in our experiments.
Best, |
Hi @AaronSaam, since the bad performance was specifically reported on the test set, maybe there is something off with the conversion from the bounding boxes to the center coordinates (which are used for evaluation on the public leaderboard). I added our code below (I'll add it to the repository, too). Note two things:
Maybe this helps. Best, import argparse
from pathlib import Path
from nndet.io import load_pickle
from nndet.core.boxes.ops_np import box_center_np
THRESHOLD = 0.5
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('source', type=Path)
args = parser.parse_args()
source = args.source
predictions = load_pickle(source / "case_boxes.pkl")
boxes = predictions["pred_boxes"]
scores = predictions["pred_scores"]
keep = scores > THRESHOLD
boxes = boxes[keep]
if boxes.size > 0:
centers = box_center_np(boxes)
else:
centers = []
with open(source / "result.txt", "a") as f:
if len(centers) > 0:
for c in centers[:-1]:
f.write(f"{round(float(c[2]))}, {round(float(c[1]))}, {round(float(c[0]))}\n")
c = centers[-1]
f.write(f"{round(float(c[2]))}, {round(float(c[1]))}, {round(float(c[0]))}") |
Hi @mibaumgartner, Will update with more details. Best, |
Hi @AaronSaam, Any update? Best, |
Hi @mibaumgartner, Currently I am again training the nnDetection on the ADAM dataset, this time mapping the label for the treated aneurysm (value=2) to background (value=0), and using a patient stratisfied split. The first fold of the network should be done training by tomorrow morning. We have not yet been able to find the root cause of where things are going differently, but it would be interesting to see and compare with the new performance. Best, |
Since we ran your first setup in our experiments (both our MICCAI and nnDetection submission) I don't think the setup will change the overall results drastically (nnDetection metrics will be worse since treated aneurysms are pretty easy to detect but I wouldn't expect huge differences in the performance of untreated aneurysms). From your original run (random split, treated and untreated aneurysms as foreground):
|
Thank you for the reply!
Best,
|
Hi @AaronSaam ,
Some other ideas (I'll only have sparse internet access during the next week, so I can only look into this in more detail from the 14th of August):
Best, |
Hi @mibaumgartner ,
The results of the second run should be ready soon. Best, |
Hi @AaronSaam, the prediction histogram looks fine, some smaller deviations are expected due to the randomness inside the training process. The histogram for our paper submission is shown below. (the sweep resulted in slightly different probability thresholds which leads to the different number of FP: since the final evaluation is thresholded at 0.5 anyway it does not have any influence on the test predictions) Since the validation results look fine and only the test results are bad / basically nonexisting (also via the nndetection evaluation) I suspect something might be off there. This kinda excludes any problems with the conversion from bouding boxes to the center points, in case you are interested I can still run the center point evaluation on the training (5 Fold CV) runs. Since the test and validation prediction use the same function to predict the data, I was wondering if there is something off with the intput data. Could you double-check that the test data follows the same scheme as the training data: i.e. I uploaded the necessary files for our test set submission here in case you want to check anything there. Best, |
Hi @mibaumgartner, Thanks for the results! I agree that the validation seems fine, which makes the disappointing results thus the more interesting. I double checked (say triple checked) the testing data and the input follows the instructions. My results for the second training (treated aneurysms mapped to background; patient stratified split) gave similar non-existing results. However, I might have a lead: I ran a prediction using the 'best model' instead of the standard 'last model' and on a quick inspection the predictions seem to be very much improved. The predictions still need to be converted into numbers, I will do that first thing in the morning. Best, |
Quick update: Following the ADAM evaluation method, the best model checkpoint predictions give a sensitivity of Code to plot the curves:
|
Hi @AaronSaam , those results look reasonable, a very good lead! (the version of the paper scored nnDetection uses MLFlow for logging which provides additional metrics and per loss logging which might be more beneficial. Unfortunately, I forgot to save the mlflow log from my run and only kept models/configs/normal logs. I also rechecked the models we submitted to the open leaderboard for our paper and those were the last models from the training (each checkpoints contains the epoch when it was saved) which did not perform in your run. Best, |
Hello All, Thank you for your comments and helps. I have already used nnDetection for ADAM dataset. I run the convert.py to generate the coordination. However, the result.txt provides the coordination as 6 numbers while it should be 3 numbers. Could you please let me know what the problem is? Regards, |
This issue is stale because it has been open for 30 days with no activity. |
Hi! Awesome work :)
Recently we have trained the nnDetection on the ADAM challenge, i.e., Task019FG_ADAM.
However, the predictions on the test set are pretty bad - a lot of false postives and general sensitivity (approaching) 0. We are trying to understand where it went wrong, maybe you could be of help.
In your case, did the network generate a low resolution model for the ADAM challenge? Our network did end up generating a low resolution model, which we did not specifically use further on.
Do you have any suggestions on what could be different with your run?
The input data was unchanged apart from the omission of one patient due to having a T1 image, and we did not deviate from the instruction steps. We trained all five folds and performed a sweep for all. After that we ran the consolidation and prediction arguments as instructed.
Thank you for your help!
Best,
Aaron
Environment Information
Currently using an NVIDIA GeForce RTX 2080 Ti; PyTorch 1.8.0; CUDA 11.2.
nnDetection was installed from [
|docker
source
].The text was updated successfully, but these errors were encountered: