Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add MGDA, DG, and normal training code #9

Open
wants to merge 74 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
822e341
add MGDA, DG, and normal training code
mseg-dataset Aug 23, 2020
81360f0
clean up training script
mseg-dataset Aug 23, 2020
8498539
continue cleaning up the training script
mseg-dataset Aug 23, 2020
b1584ce
clean up imports
mseg-dataset Aug 23, 2020
8724d2e
rename StupidTaxonomyConverter to NaiveTaxonomyConverter
Sep 25, 2020
82ab6a0
remove commented lines
Sep 25, 2020
196d10c
remove commented out lines
Sep 25, 2020
535cec1
remove commented out lines
Sep 25, 2020
f3586f9
remove commented out lines
Sep 25, 2020
9710a55
remove commented out lines
Sep 25, 2020
2584bd7
remove commented out lines
Sep 25, 2020
5c15c5d
remove commented out lines
Sep 25, 2020
b30ac36
remove commented out lines
Sep 25, 2020
19a5d26
remove commented out lines
Sep 25, 2020
53b5d46
remove commented out lines
Sep 25, 2020
c774215
remove commented out lines
Sep 25, 2020
f0e5b64
remove commented-out lines
Sep 25, 2020
86b941d
remove commented out lines
Sep 25, 2020
db8390c
remove commented-out lines
Sep 25, 2020
6a29c5e
remove commented-out lines
Sep 25, 2020
fdb38a0
remove commented out lines
Sep 25, 2020
b7c7a09
Create training.md
Sep 25, 2020
7c2f909
remove commented-out lines
Sep 25, 2020
dd57119
Update training.md
Sep 25, 2020
0cc23ac
Update training.md
Sep 25, 2020
0deeac1
remove commented-out lines
Sep 25, 2020
ef89a9b
update instructions for training
johnwlambert Oct 15, 2020
20fb658
remove commented out lines
johnwlambert Oct 15, 2020
5ae9cac
remove commented out lines
johnwlambert Oct 15, 2020
81705c1
remove deprecated version ref in TaxonomyConverter
johnwlambert Oct 15, 2020
d7d88f9
remove tax version param
johnwlambert Oct 15, 2020
32be482
remove tax version param
johnwlambert Oct 15, 2020
1692391
remove tax version param
johnwlambert Oct 15, 2020
c2028d9
remove tax version param
johnwlambert Oct 15, 2020
5184faf
remove tax version param
johnwlambert Oct 15, 2020
c71d55c
remove tax version param
johnwlambert Oct 15, 2020
5fd9ed3
remove tax version param
johnwlambert Oct 15, 2020
fcfaecb
update ToFlatLabel to ToUniversalLabel
johnwlambert Oct 17, 2020
9b6a43f
clean up logic with naive taxonomy
johnwlambert Oct 17, 2020
96bec76
improve variable names
johnwlambert Oct 17, 2020
0f84e15
improve variable names
johnwlambert Oct 17, 2020
f3f9dbb
improve variable names
johnwlambert Oct 17, 2020
d1928bd
improve var names
johnwlambert Oct 17, 2020
fdbdec9
improve var names
johnwlambert Oct 17, 2020
cacd162
improve var names
johnwlambert Oct 17, 2020
b2b8d29
improve var names
johnwlambert Oct 17, 2020
41d48bf
improve var names
johnwlambert Oct 17, 2020
3726638
improve var names
johnwlambert Oct 17, 2020
f8afb3c
update args.tc.classes to args.tc.num_uclasses to reflect TaxononomyC…
johnwlambert Oct 22, 2020
b7ad193
remove outdated config
Dec 9, 2020
566a1ad
clean up old yaml files, just pass dataset name at command line
Dec 9, 2020
71e4cd7
remove unused config param
Dec 9, 2020
dbe7b06
remove outdated configs
Dec 9, 2020
12f6655
Delete unused configs
Dec 9, 2020
61fc77b
remove unused configs
Dec 9, 2020
dcda7e1
remove unused configs
Dec 9, 2020
dee5a37
remove old VGA configs
Dec 9, 2020
ea31c75
correct typo
Dec 9, 2020
0917abb
clean up train.py logic
Dec 9, 2020
ae063cc
clean up train.py logic
Dec 9, 2020
751000e
remove tensorboard, since not using writer anyways
Dec 9, 2020
d011fab
fix typos in train script
Dec 9, 2020
6711886
merge master into training branch
Dec 9, 2020
87cc9c8
remove old print statements
Dec 9, 2020
c987204
Merge branch 'training' of https://github.com/mseg-dataset/mseg-seman…
Dec 9, 2020
cb23214
reformat train script using Python black formatter
Dec 9, 2020
e741a3a
reformat more code with python black and remove finetune option (unused)
Dec 9, 2020
6592799
remove unused finetune option from configs
Dec 9, 2020
b121012
make a separate function to just compute number of iterations required
Dec 9, 2020
f8aac5c
reformat with black
Dec 9, 2020
fdf3b60
clarify docstring when determining number of iters to run
Dec 9, 2020
f45dfbb
edit docstring describing number of iters
Dec 9, 2020
f6d6a8b
fix type hint
Dec 10, 2020
e9e35bf
move apex docstring to training.md
Dec 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions mseg_semantic/config/train/1080_release/mseg-mgda.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# difference with normal mseg.yaml is "use_apex: False", since apex model does not support model.no_sync()
DATA:
dataset: [
ade20k-150-relabeled,
bdd-relabeled,
cityscapes-19-relabeled,
coco-panoptic-133-relabeled,
idd-39-relabeled,
mapillary-public65-relabeled,
sunrgbd-37-relabeled]
universal: True
use_mgda: False # to be determined at argument

TRAIN:
use_naive_taxonomy: False
arch: hrnet
network_name:
layers:
sync_bn: True # adopt sync_bn or not
train_h: 713
train_w: 713
scale_min: 0.5 # minimum random scale
scale_max: 2.0 # maximum random scale
short_size: 1080
rotate_min: -10 # minimum random rotate
rotate_max: 10 # maximum random rotate
zoom_factor: 8 # zoom factor for final prediction during training, be in [1, 2, 4, 8]
ignore_label: 255
aux_weight: 0.4
num_examples: 1000000
train_gpu: [0, 1, 2, 3, 4, 5, 6]
dataset_gpu_mapping: {
'ade20k-150-relabeled': [0],
'bdd-relabeled': [1],
'cityscapes-19-relabeled': [2],
'coco-panoptic-133-relabeled': [3],
'idd-39-relabeled': [4],
'mapillary-public65-relabeled': [5],
'sunrgbd-37-relabeled': [6],
}
workers: 64 # data loader workers
batch_size: 35 # batch size for training
batch_size_val: 1 # batch size for validation during training, memory and speed tradeoff
base_lr: 0.01
epochs: 10
start_epoch: 0
power: 0.9
momentum: 0.9
weight_decay: 0.0001
manual_seed:
print_freq: 10
save_freq: 1
save_path: default
weight: # path to initial weight (default: none)
resume: # path to latest checkpoint (default: none)
auto_resume: None # xx
evaluate: False # evaluate on validation set, extra gpu memory needed and small batch_size_val is recommend
Distributed:
dist_url: tcp://127.0.0.1:6795
dist_backend: 'nccl'
multiprocessing_distributed: True
world_size: 1
rank: 0
use_apex: False
opt_level: 'O0'
keep_batchnorm_fp32:
loss_scale:
66 changes: 66 additions & 0 deletions mseg_semantic/config/train/1080_release/mseg-naive-baseline.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
DATA:
dataset: [
ade20k-150,
bdd,
cityscapes-19,
coco-panoptic-133,
idd-39,
mapillary-public65,
sunrgbd-37]
universal: True
use_mgda: False # to be determined at argument

TRAIN:
use_naive_taxonomy: True
arch: hrnet
network_name:
layers:
sync_bn: True # adopt sync_bn or not
train_h: 713
train_w: 713
scale_min: 0.5 # minimum random scale
scale_max: 2.0 # maximum random scale
short_size: 1080
rotate_min: -10 # minimum random rotate
rotate_max: 10 # maximum random rotate
zoom_factor: 8 # zoom factor for final prediction during training, be in [1, 2, 4, 8]
ignore_label: 255
aux_weight: 0.4
num_examples: 1000000
train_gpu: [0, 1, 2, 3, 4, 5, 6]
dataset_gpu_mapping: {
'ade20k-150': [0],
'bdd': [1],
'cityscapes-19': [2],
'coco-panoptic-133': [3],
'idd-39': [4],
'mapillary-public65': [5],
'sunrgbd-37': [6],
}
workers: 64 # data loader workers
batch_size: 28 # batch size for training
batch_size_val: 1 # batch size for validation during training, memory and speed tradeoff
base_lr: 0.01
epochs: 10
start_epoch: 0
power: 0.9
momentum: 0.9
weight_decay: 0.0001
manual_seed:
print_freq: 10
save_freq: 1
save_path: default
weight: # path to initial weight (default: none)
resume: # path to latest checkpoint (default: none)
auto_resume: None # xx
evaluate: False # evaluate on validation set, extra gpu memory needed and small batch_size_val is recommend
Distributed:
dist_url: tcp://127.0.0.1:6795
dist_backend: 'nccl'
multiprocessing_distributed: True
world_size: 1
rank: 0
use_apex: True
opt_level: 'O0'
keep_batchnorm_fp32:
loss_scale:
66 changes: 66 additions & 0 deletions mseg_semantic/config/train/1080_release/mseg-relabeled-1m.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
DATA:
dataset: [
ade20k-150-relabeled,
bdd-relabeled,
cityscapes-19-relabeled,
coco-panoptic-133-relabeled,
idd-39-relabeled,
mapillary-public65-relabeled,
sunrgbd-37-relabeled]
universal: True
use_mgda: False # to be determined at argument

TRAIN:
use_naive_taxonomy: False
arch: hrnet
network_name:
layers:
sync_bn: True # adopt sync_bn or not
train_h: 713
train_w: 713
scale_min: 0.5 # minimum random scale
scale_max: 2.0 # maximum random scale
short_size: 1080 # image resolution is 1080p at training
rotate_min: -10 # minimum random rotate
rotate_max: 10 # maximum random rotate
zoom_factor: 8 # zoom factor for final prediction during training, be in [1, 2, 4, 8]
ignore_label: 255
aux_weight: 0.4
num_examples: 1000000 # 1 Million crops per dataset is default training duration
train_gpu: [0, 1, 2, 3, 4, 5, 6]
dataset_gpu_mapping: {
'ade20k-150-relabeled': [0],
'bdd-relabeled': [1],
'cityscapes-19-relabeled': [2],
'coco-panoptic-133-relabeled': [3],
'idd-39-relabeled': [4],
'mapillary-public65-relabeled': [5],
'sunrgbd-37-relabeled': [6]
}
workers: 64 # data loader workers
batch_size: 14 # batch size for training
batch_size_val: 1 # batch size for validation during training, memory and speed tradeoff
base_lr: 0.01
epochs: 10
start_epoch: 0
power: 0.9
momentum: 0.9
weight_decay: 0.0001
manual_seed:
print_freq: 10
save_freq: 1
save_path: default
weight: # path to initial weight (default: none)
resume: # path to latest checkpoint (default: none)
auto_resume: None # xx
evaluate: False # evaluate on validation set, extra gpu memory needed and small batch_size_val is recommend
Distributed:
dist_url: tcp://127.0.0.1:6795
dist_backend: 'nccl'
multiprocessing_distributed: True
world_size: 1
rank: 0
use_apex: True
opt_level: 'O0'
keep_batchnorm_fp32:
loss_scale:
66 changes: 66 additions & 0 deletions mseg_semantic/config/train/1080_release/mseg-relabeled-3m.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
DATA:
dataset: [
ade20k-150-relabeled,
bdd-relabeled,
cityscapes-19-relabeled,
coco-panoptic-133-relabeled,
idd-39-relabeled,
mapillary-public65-relabeled,
sunrgbd-37-relabeled]
universal: True
use_mgda: False # to be determined at argument

TRAIN:
use_naive_taxonomy: False
arch: hrnet
network_name:
layers:
sync_bn: True # adopt sync_bn or not
train_h: 713
train_w: 713
scale_min: 0.5 # minimum random scale
scale_max: 2.0 # maximum random scale
short_size: 1080 # image resolution is 1080p for training
rotate_min: -10 # minimum random rotate
rotate_max: 10 # maximum random rotate
zoom_factor: 8 # zoom factor for final prediction during training, be in [1, 2, 4, 8]
ignore_label: 255
aux_weight: 0.4
num_examples: 3000000
train_gpu: [0, 1, 2, 3, 4, 5, 6]
dataset_gpu_mapping: {
'ade20k-150-relabeled': [0],
'bdd-relabeled': [1],
'cityscapes-19-relabeled': [2],
'coco-panoptic-133-relabeled': [3],
'idd-39-relabeled': [4],
'mapillary-public65-relabeled': [5],
'sunrgbd-37-relabeled': [6],
}
workers: 64 # data loader workers
batch_size: 35 # batch size for training
batch_size_val: 1 # batch size for validation during training, memory and speed tradeoff
base_lr: 0.01
epochs: 10
start_epoch: 0
power: 0.9
momentum: 0.9
weight_decay: 0.0001
manual_seed:
print_freq: 10
save_freq: 1
save_path: default
weight: # path to initial weight (default: none)
resume: # path to latest checkpoint (default: none)
auto_resume: None # xx
evaluate: False # evaluate on validation set, extra gpu memory needed and small batch_size_val is recommend
Distributed:
dist_url: tcp://127.0.0.1:6795
dist_backend: 'nccl'
multiprocessing_distributed: True
world_size: 1
rank: 0
use_apex: True
opt_level: 'O0'
keep_batchnorm_fp32:
loss_scale:
67 changes: 67 additions & 0 deletions mseg_semantic/config/train/1080_release/mseg-unrelabeled.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
DATA:
dataset: [
ade20k-150,
bdd,
cityscapes-19,
coco-panoptic-133,
idd-39,
mapillary-public65,
sunrgbd-37]
universal: True
use_multiple_datasets: True
use_mgda: False # to be determined at argument

TRAIN:
use_naive_taxonomy: False
arch: hrnet
network_name:
layers:
sync_bn: True # adopt sync_bn or not
train_h: 713
train_w: 713
scale_min: 0.5 # minimum random scale
scale_max: 2.0 # maximum random scale
short_size: 1080
rotate_min: -10 # minimum random rotate
rotate_max: 10 # maximum random rotate
zoom_factor: 8 # zoom factor for final prediction during training, be in [1, 2, 4, 8]
ignore_label: 255
aux_weight: 0.4
num_examples: 1000000
train_gpu: [0, 1, 2, 3, 4, 5, 6]
dataset_gpu_mapping: {
'ade20k-150': [0],
'bdd': [1],
'cityscapes-19': [2],
'coco-panoptic-133': [3],
'idd-39': [4],
'mapillary-public65': [5],
'sunrgbd-37': [6],
}
workers: 64 # data loader workers
batch_size: 35 # batch size for training
batch_size_val: 1 # batch size for validation during training, memory and speed tradeoff
base_lr: 0.01
epochs: 10
start_epoch: 0
power: 0.9
momentum: 0.9
weight_decay: 0.0001
manual_seed:
print_freq: 10
save_freq: 1
save_path: default
weight: # path to initial weight (default: none)
resume: # path to latest checkpoint (default: none)
auto_resume: None # xx
evaluate: False # evaluate on validation set, extra gpu memory needed and small batch_size_val is recommend
Distributed:
dist_url: tcp://127.0.0.1:6795
dist_backend: 'nccl'
multiprocessing_distributed: True
world_size: 1
rank: 0
use_apex: True
opt_level: 'O0'
keep_batchnorm_fp32:
loss_scale:
53 changes: 53 additions & 0 deletions mseg_semantic/config/train/1080_release/single_oracle.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
DATA:
dataset: single
universal: False
use_mgda: False # to be determined at argument

TRAIN:
use_naive_taxonomy: False
arch: hrnet
network_name:
layers:
sync_bn: True # adopt sync_bn or not
train_h: 713
train_w: 713
scale_min: 0.5 # minimum random scale
scale_max: 2.0 # maximum random scale
short_size: 1080
rotate_min: -10 # minimum random rotate
rotate_max: 10 # maximum random rotate
zoom_factor: 8 # zoom factor for final prediction during training, be in [1, 2, 4, 8]
ignore_label: 255
aux_weight: 0.4
num_examples: 1000000
train_gpu: [0, 1, 2, 3, 4, 5, 6, 7]
dataset_gpu_mapping: {
'single': [0, 1, 2, 3, 4, 5, 6, 7],
}
workers: 32 # data loader workers
batch_size: 32 # batch size for training
batch_size_val: 1 # batch size for validation during training, memory and speed tradeoff
base_lr: 0.01
epochs: 10
start_epoch: 0
power: 0.9
momentum: 0.9
weight_decay: 0.0001
manual_seed:
print_freq: 10
save_freq: 1
save_path: default
weight: # path to initial weight (default: none)
resume: # path to latest checkpoint (default: none)
auto_resume: None
evaluate: False # evaluate on validation set, extra gpu memory needed and small batch_size_val is recommend
Distributed:
dist_url: tcp://127.0.0.1:6795
dist_backend: 'nccl'
multiprocessing_distributed: True
world_size: 1
rank: 0
use_apex: True
opt_level: 'O0'
keep_batchnorm_fp32:
loss_scale:
Loading