Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train code #12

Closed
luocmin opened this issue Oct 12, 2020 · 28 comments
Closed

Train code #12

luocmin opened this issue Oct 12, 2020 · 28 comments

Comments

@luocmin
Copy link

luocmin commented Oct 12, 2020

Does this repository provide training code? I only see the test code

@johnwlambert
Copy link
Collaborator

Hi @luocmin, thanks for your interest in our work. I'm still cleaning up all of the training scripts, but all the code should be in this PR to train any model from our paper: #9

In which taxonomy/on which datasets/at which resolution would you like to train? You can find more details here.

@luocmin
Copy link
Author

luocmin commented Oct 13, 2020

Hi @johnwlambert ,Thank you for helping me solve the question. According to the link you provided, I found that the MSEG-Semantic code I downloaded is different from the code in your link.
Since I am a white person, it is not clear about the training, so I also want to ask whether the training code is based on the code in semseG project mentioned in the project.
me:
image
you:
image

@johnwlambert
Copy link
Collaborator

Hi @luocmin, I didn't quite understand your question -- do you mind explaining a bit more about the issue you are facing?

The training script in the semseg repo is a starting point for our work, but we add several hundred additional lines of code to their training script to accommodate training on multiple datasets at once, and to incorporate MGDA and domain generalization.

Can you pull the latest into your clone/fork of the repo? There shouldn't be any breaking changes.

@luocmin
Copy link
Author

luocmin commented Oct 13, 2020

@johnwlambert ,My question is why the code I downloaded does not have some configuration files such as training.md.
Sorry, as a novice, if I want to train, besides referring to semseg and MGDA, what other issues should I pay attention to?

@luocmin
Copy link
Author

luocmin commented Oct 13, 2020

@johnwlambert ,The problem with different codes I found that there are different branches in a warehouse. I see ccsa_train.py in the code domain_generalization. Is this script used for training? Because you didn’t describe how to train in the project description.
Can you give a command to execute the training script? Thank you
Regarding the data set processing, the tests are very detailed, but there is no training. As a novice, I have the data now, but I can't start the training. Can you give me some pointers?

@johnwlambert
Copy link
Collaborator

johnwlambert commented Oct 14, 2020

train.py is used for training models on individual datasets and MSeg using traditional cross-entropy loss and MGDA. ccsa_train.py should be used to train CCSA (domain generalization) models.

Could you let me know more about which model you would like to train (on which datasets, using which taxonomy, using which resolution, and which training technique) and I can point you to the relevant config? There are admittedly a ton of experiment config files since we released dozens of models.

@luocmin
Copy link
Author

luocmin commented Oct 15, 2020

I'm basically trying to solve the training of lane markings by trying to think of you as being able to train multiple data sets to achieve generalization. At present, I have remAP,relabel the relevant data set according to the MSEG-API provided by you. But I have a question about the difference between data set remap and relabel.
Since the code involved is a bit confusing, I would like to ask if the train.py you are referring to is mseg_semantic/tool/train.py or semiseg/tool/train.py

@luocmin
Copy link
Author

luocmin commented Oct 15, 2020

The elements framed in red are not in the file path pointed to by it.
image

@luocmin
Copy link
Author

luocmin commented Oct 15, 2020

image
image

@johnwlambert
Copy link
Collaborator

I see, thanks for the explanation. Adding lane markings to the universal taxonomy is an interesting experiment and could be quite valuable for self-driving applications. We excluded it from the universal taxonomy since it didn't adhere to the principles in our decision tree (from our paper, since they are marked as "road" in Cityscapes, BDD, IDD, COCO, ADE20K etc). I'm interested to hear what you discover.

The train.py from our repo is what you should use, since it merges multiple datasets at training time using our TaxonomyConverter class.

The relevant config is
480p: config/train/480_release/mseg-3m.yaml
720p: config/train/720_release/mseg-3m.yaml
1080p, (3 million crops): mseg_semantic/config/train/1080_release/mseg-lowres-3m.yaml
1080p, (1 million crops): mseg_semantic/config/train/1080_release/mseg-lowres.yaml

Have you downloaded all the datasets as described here, and do the unit tests pass successfully at the end?

@luocmin
Copy link
Author

luocmin commented Oct 15, 2020

  1. I didn’t find anything. I studied your paper because I set up to do illegal identification of vehicles on the road, so I need to identify the types of lane lines: double yellow solid lines, zebra crossings, bus lanes, stop lines Wait. So can the author give me some suggestions? Will the weights obtained from your code training help me?

  2. I have completed the download of the data set step by step in accordance with the requirements in mseg-api/download_scripts/README.md-unzip-remap-relabel-verify path-verify relabel (only ScanNet in the test set has not been downloaded, because I The application has not been approved yet and cannot be downloaded. As for the training set, the 7 data sets have been successfully processed)

  3. Why do the following problems occur?
    image

@johnwlambert
Copy link
Collaborator

johnwlambert commented Oct 15, 2020

I just updated https://github.com/mseg-dataset/mseg-semantic/blob/master/training.md.

Can you send me the exact commands you are running?

If the following script were named tool/train-qvga-mix-copy.sh, you would call it as:

tool/train-qvga-mix-copy.sh 1080_release/mseg-lowres-3m.yaml False exp ${WORK}/copies/final_train/1080_release/mseg-lowres-3m

#!/bin/sh
PYTHON=/home/anaconda3/envs/pth13/bin/python

config=config/final_train/$1
use_mgda=$2
exp_name=$3

new_folder=$4
mkdir -p ${new_folder}
cp -r config consistency dataset lib model multiobjective_opt pba_utils taxonomy tool util vis_utils ${new_folder}
cp taxonomy* ${new_folder} 
cd ${new_folder}
echo 'CD into the destination folder'

exp_dir=${exp_name}
model_dir=${exp_dir}/model
result_dir=${exp_dir}/result

now=$(date +"%Y%m%d_%H%M%S")

mkdir -p ${model_dir} ${result_dir}

export PYTHONPATH=./

$PYTHON -u tool/train.py \
  --config=${config} use_mgda ${use_mgda} save_path ${model_dir} auto_resume ${model_dir} \
  2>&1 | tee ${model_dir}/train-$now.log

@luocmin
Copy link
Author

luocmin commented Oct 15, 2020

I am running directly in pycharm right click, no command

@luocmin
Copy link
Author

luocmin commented Oct 15, 2020

I studied your paper because I set up to do illegal identification of vehicles on the road, so I need to identify the types of lane lines: double yellow solid lines, zebra crossings, bus lanes, stop lines Wait. So can the author give me some suggestions? Will the weights obtained from your code training help me?

According to your prompt, I found that the specific lane types on the road are not distinguished, but they are all recognized as road, but I want to train through the methods provided in your paper to obtain pre-training weights, and then use them and exclusively for In the data set of the lane recognition network that trains itself in the marked category, is this method feasible? Because I don’t have enough time now, I would like to ask the author to help me answer it, thank you

@luocmin
Copy link
Author

luocmin commented Oct 15, 2020

Author, I see your code, I am very confused, unable to start, feel that I am too stupid

@luocmin
Copy link
Author

luocmin commented Oct 15, 2020

What does tax_version: 4.0 refer to?
image

@johnwlambert
Copy link
Collaborator

Hi @luocmin, I think you could use your mseg-3m model as a starting point, and replace the final few layers with an expanded taxonomy or just your classes of interest. Then you could fine-tune on Mapillary for your desired classes. Alternatively, you could train from scratch with the expanded taxonomy.

You will need to pass the arguments I mentioned above via command line

python -u tool/train.py  --config=${config} use_mgda ${use_mgda} save_path ${model_dir} auto_resume ${model_dir}

@luocmin
Copy link
Author

luocmin commented Oct 16, 2020

1、I have seen the configuration file. One data set USES one GPU for multi-data set training, but I don't have enough hardware resources. I only have four Gpus at most,Can I train?
2、The purpose of using the MGDA is unclear
3、Does save_path refer to the path saved by the weights after training
4、Auto_resume refers to the weight of breakpoint training, or the mseg-3m.pth provided by the author

@johnwlambert
Copy link
Collaborator

Hi @luocmin , please find answers to your questions below:

  1. Sure, you can still train by setting to the batch size to a smaller number, or by training at a lower input resolution (smaller input crops, see the 480p or 720p configs instead of 1080p config). How many/which datasets are you training on?
  2. Please refer to the section "Algorithms for learning from multiple domains" from our paper. In our ablation experiments, we found that training with MGDA does not lead to the best model, so we set it to false when training our best models.
  3. save_path is the directory where the model checkpoints and results will be saved. See here.
  4. We use the auto_resume config parameter to allow one to continue training if training is interrupted due to a scheduler compute time limit or hardware error. You could also use it to fine-tune a model.

@luocmin
Copy link
Author

luocmin commented Oct 16, 2020

1、1. Like the author’s data set, the data set is processed according to the author’s method. I haven’t started training yet. I consulted the teacher today. The purpose of studying the author’s code is to learn the roadside environment, such as Railings, buildings, etc. As for lane recognition, he said that because the author’s weight is too large and inappropriate, I need to train a small weight

@luocmin
Copy link
Author

luocmin commented Oct 16, 2020

1、Which resolution would you like to train at? (480p, 720p, or 1080p)----480p
2 、Which datasets would you like to train on?----- (all of relabeled MSeg)
3、In which taxonomy (output space) would you like to train the model to make predictions?
I don’t quite understand there are several kinds of taxonomy, so I don’t know how to choose

@luocmin
Copy link
Author

luocmin commented Oct 16, 2020

May I ask the txt file in mseg-api/mseg/dataset_lists/, does the author provide a script to generate the txt file of the data set path in the code?

@luocmin
Copy link
Author

luocmin commented Oct 16, 2020

What I understand by these lines of code is: 7 cards and one card correspond to one data set, but I don’t have 7 cards, only 4, can I only use 4 data sets for mixed training?
image

@luocmin
Copy link
Author

luocmin commented Oct 16, 2020

Where does this function come from?
image

@johnwlambert
Copy link
Collaborator

Thanks for catching this, that was a deprecated name -- should be ToUniversalLabel, not ToFlatLabel see here. I've updated the training script to reflect this.

@johnwlambert
Copy link
Collaborator

johnwlambert commented Oct 17, 2020

If you only have 4 cards, instead of 7, and you still want to train on all 7 datasets, you will need to re-write some training logic. We make the assumption that a user would have at least 7 cards.

Instead of running each iteration over samples from the 7 datasets (7 dataloaders, one in each process), you could run 1 training iteration with 4 datasets, then another training iteration with the other 3 datasets, etc. Alternatively, you could concat all training images into 1 dataloader, and then shard that across the 4 gpus in DistributedDataParallel. Either way, you will need to re-write some code.

@johnwlambert
Copy link
Collaborator

Regarding the scripts to generate the txt files in mseg-api/mseg/dataset_lists/ -- these should all be generated already for every MSeg training and test dataset. Are you adding paths for a new different dataset?

@FuNian788
Copy link

@luocmin I may understand your problem, you just downloaded codes from branch 'master', but the training scripts and configs are in branch 'train'. Just change the branch and you can train whole model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants