Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Review 8/29/18 #46

Open
wants to merge 231 commits into
base: empty
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
231 commits
Select commit Hold shift + click to select a range
f005132
Initial commit
justinsalamon Oct 11, 2017
1a10814
Ignore pycharm files
justinsalamon Oct 11, 2017
8a89d27
Initial structure
justinsalamon Oct 11, 2017
2d71992
Add basic functions to sample from audio and video files
Oct 15, 2017
752d2af
Add AudioSet ontology class
auroracramer Oct 17, 2017
c3a1b3b
Add preliminary training code (no eval yet)
auroracramer Oct 21, 2017
044c0f0
sample any 1 second contains video, move open files out
Oct 21, 2017
bcd53d4
Merge branch 'pescador' into model
auroracramer Oct 23, 2017
5b76885
open audio file outside
Nov 1, 2017
0ccc83e
Merge pull request #5 from marl/pescador
hohsiangwu Nov 1, 2017
5eddbd0
Merge branch 'master' into model
auroracramer Nov 1, 2017
8962b39
Add resize function
Nov 1, 2017
0449b1a
Merge branch 'master' into model
auroracramer Nov 1, 2017
c5bbd03
Change image sampling to follow paper
auroracramer Nov 1, 2017
0b94da4
Merge pull request #6 from marl/model
auroracramer Nov 1, 2017
7b9c518
s/min/max
Nov 2, 2017
38fcfa4
Custom read_video function
Nov 2, 2017
8e42cd3
Merge pull request #8 from marl/fix_video_read
auroracramer Nov 2, 2017
f43f01f
fix some configuration so the model can train
Nov 5, 2017
eb63af1
Merge pull request #9 from marl/fix_model
auroracramer Nov 6, 2017
0c3e38b
Update Spectrogram parameters to reflect changes in kapre fork
auroracramer Nov 6, 2017
d87bd43
Add data augmentation from paper
auroracramer Nov 7, 2017
cfd750d
Add docstrings and fix brightness adjustment
auroracramer Nov 7, 2017
db1315b
Add augmentation option to training functions and add use of validati…
auroracramer Nov 7, 2017
2908f07
Randomize order of saturation and brightness jittering
auroracramer Nov 7, 2017
a2aaf0b
Fix rescaling bug
auroracramer Nov 8, 2017
631d55a
Merge pull request #10 from marl/augment
hohsiangwu Nov 8, 2017
b68eae5
Add more changes to the model so that it is consistent with paper
auroracramer Nov 8, 2017
c18055b
Merge pull request #11 from marl/model-fixes
auroracramer Nov 8, 2017
f320653
Split L3 network into separate audio and vision models
auroracramer Nov 11, 2017
3ac6394
Merge pull request #13 from marl/modular_net_setup
hohsiangwu Nov 11, 2017
0e8254d
Add multi-gpus support
Nov 12, 2017
9be139e
default to int
Nov 12, 2017
802910d
Sometimes audio_data is less than a second
Nov 12, 2017
4924284
explicit handle len(audio_data) < sampling_frequency
Nov 12, 2017
d00dcb5
use default 10 to save model
Nov 12, 2017
8502805
Merge pull request #14 from marl/multi-gpus
auroracramer Nov 13, 2017
45f8895
Make training a bit more robust
auroracramer Nov 13, 2017
af38305
Zero pad data that is less than 1 second, and downmix channels rather…
auroracramer Nov 13, 2017
b1133af
Fix checkpoint interval CLI option; always ensure both audio and video
auroracramer Nov 13, 2017
3a179f5
Merge pull request #15 from marl/robust
hohsiangwu Nov 13, 2017
6dad75c
Include all video files for samples
Nov 14, 2017
da52b75
Merge pull request #16 from marl/sample_all
auroracramer Nov 14, 2017
1434d13
video does not need to be 1 second long
Nov 16, 2017
01a8a23
Handle case where video is less than 1 second, change order of sampli…
auroracramer Nov 16, 2017
d92cdcd
Merge branch 'master' into sample_robust
auroracramer Nov 16, 2017
f2b959f
Fix typos, randomly shuffle files, only use multi GPU model when usin…
auroracramer Nov 17, 2017
4ccfcbb
User start frame instead of first frame when sample window is only a …
auroracramer Nov 17, 2017
270ae16
Merge pull request #17 from marl/sample_robust
hohsiangwu Nov 17, 2017
7166e51
Close streamer if files cannot be opened
auroracramer Nov 22, 2017
a2ffb82
Change sampling scheme and add filtering
auroracramer Nov 23, 2017
81c4b9f
Fix typos and add missing arguments
auroracramer Nov 25, 2017
a7d5c54
Make sure different video file is chosen as a distractor
auroracramer Nov 25, 2017
9e8bfa6
Reduce memory used by streamers
auroracramer Nov 29, 2017
353dec5
Merge pull request #19 from marl/new_sample
auroracramer Dec 4, 2017
42bc845
Add model loading functions and add training procedure for urban soun…
auroracramer Dec 4, 2017
ed8c738
Compute metrics for both train and test sets and save results to disk
auroracramer Dec 5, 2017
1345150
Fix issues and add logging
auroracramer Dec 6, 2017
89cadf8
Make sure valid_filenames after filtering is a set so membership chec…
auroracramer Dec 7, 2017
7b4de29
Add missing newline in logging
auroracramer Dec 10, 2017
cd82a92
Change video sampling and fix loss type
auroracramer Dec 10, 2017
909de54
Change sampling to precompute samples
auroracramer Dec 11, 2017
9227665
Fix warnings and add ability to use a random subsample of videos
auroracramer Dec 12, 2017
7d1c3aa
Add various timing, efficiency, etc changes and fixes
auroracramer Dec 18, 2017
a10f213
Add slurm scripts
auroracramer Dec 18, 2017
9b68630
Add .gitignore
auroracramer Dec 18, 2017
e3db51a
Add docstrings, README, and fix some minor typos
auroracramer Dec 19, 2017
b4dbcb1
Add requirements
auroracramer Dec 19, 2017
f8c844b
Add script for pre sampling
Jan 17, 2018
2abdd73
make num of workers args
Jan 17, 2018
171f415
Rearrange args
Jan 17, 2018
f5eae53
output_dir
Jan 17, 2018
b3d41a0
Create code and scripts to create data subsets and fixed label filter…
auroracramer Jan 19, 2018
e6d9912
modify sampling script to take filter file
Jan 19, 2018
41b9662
Add filter_file in sample function
Jan 19, 2018
e58b0e3
Handle case where there are no accept filters and add utils
auroracramer Jan 22, 2018
aa6b796
Refactor sample script
auroracramer Jan 22, 2018
cd4ec73
Allow user to choose number of samples generated
auroracramer Jan 22, 2018
b39a724
Add example sbatch script for generating samples
auroracramer Jan 23, 2018
95fb2a0
Add random state to batch file name
auroracramer Jan 23, 2018
443c8c5
output exact random_state used
Jan 23, 2018
ac4e413
Make mux cycling True by default, fix unicode issue when writing batc…
auroracramer Jan 23, 2018
9f60e0d
Change generator to load from files, allow continuing previous traini…
auroracramer Jan 24, 2018
4a0bad0
Update generate sample sbatch script and add corresponding array script
auroracramer Jan 24, 2018
c2bd038
Add back missing cast to float, and make compatible with pypi version…
auroracramer Jan 24, 2018
4b0d1ff
Fix random state in sbatch array script
auroracramer Jan 24, 2018
f03473c
Fix batch filename
auroracramer Jan 25, 2018
70a78f6
Fix number of tasks in sample generation sbatch array script
auroracramer Jan 25, 2018
830ace9
Add sample generation sbatch script for training set
auroracramer Jan 25, 2018
83e8797
Merge branch 'master' into new_train_jtc
auroracramer Jan 26, 2018
49737b8
Fix bug where batch size was not given to validation data generator a…
auroracramer Jan 26, 2018
ac84fda
Remove dependency on kapre fork
auroracramer Jan 26, 2018
90d4481
Fix typos related to continuing training previous models
auroracramer Jan 28, 2018
0464021
Save images as uint8 to save 3/4 of image space, and remove unnecessa…
auroracramer Jan 29, 2018
70f4ae2
Change sampling to save audio as int16, trim filenames, and return sa…
auroracramer Jan 29, 2018
f423785
Merge branch 'master' into new_train_jtc
auroracramer Jan 29, 2018
1c5dbb6
Update train batch generation sbatch script
auroracramer Jan 30, 2018
e77b67a
Properly set random seed for pescador sampling
auroracramer Feb 1, 2018
92f4c13
Make array task id start at zero from random state computation
auroracramer Feb 1, 2018
6c30c87
Add second train generation sbatch script
auroracramer Feb 1, 2018
1312d6d
Fix array indexing in template sample generation sbatch script
auroracramer Feb 2, 2018
1ef0259
Add latest sample generation sbatch array script
auroracramer Feb 2, 2018
1fa53f7
Fix zero padding bug introduced when downmixing audio before sampling
auroracramer Feb 6, 2018
cbcf864
Merge branch 'master' into new_train_jtc
auroracramer Feb 6, 2018
002a544
Convert loaded batch data from int to float
auroracramer Feb 6, 2018
4d21d3f
Add spectrogram normalization from L3 authors
auroracramer Feb 6, 2018
e30eaf3
Revert weight decay factors
auroracramer Feb 6, 2018
35e1cc6
Merge pull request #22 from marl/new_train_jtc
hohsiangwu Feb 7, 2018
4f1f26a
Add latest train sbatch script
auroracramer Feb 9, 2018
901df58
Add script for plotting training history
auroracramer Feb 13, 2018
92c5e5d
Add He-Normal initialization to the model
auroracramer Feb 13, 2018
474748d
Log train arguments, and change loss to binary_crossentropy (which sh…
auroracramer Feb 13, 2018
614c96b
Add new variant model with Kapre decibel computation, and input stand…
auroracramer Feb 13, 2018
207ea28
Add Google Sheets hooks for consolidated bookkeeping of experiments
auroracramer Feb 16, 2018
9aa9cb7
Fix command line behavior for initializing google sheets credentials
auroracramer Feb 16, 2018
b52caad
fix vision_model input
Feb 18, 2018
3518bc3
Merge pull request #23 from marl/fix_vision
auroracramer Feb 19, 2018
9afc896
Add mel spectrogram models
auroracramer Feb 24, 2018
7f28370
Add support for framewise features for classifiers, reorganize classi…
auroracramer Feb 24, 2018
bf387f6
Update classifier training script
auroracramer Feb 24, 2018
8674797
Fix LOGGER issues in classifier modules
auroracramer Feb 24, 2018
3d5d5a4
Fix name overriding of "features"
auroracramer Feb 25, 2018
e47221a
Add support in model module to convert between GPUs (though this requ…
auroracramer Feb 25, 2018
7c0208b
Add retry to Google Sheets requests
auroracramer Feb 26, 2018
196337b
Pass GPU parameters from load_embedding to load_model
auroracramer Feb 26, 2018
448ae0f
Handle case where number of frames per file is consistent in classifi…
auroracramer Feb 26, 2018
f38fd6b
Merge branch 'framewise_cls'
auroracramer Feb 28, 2018
68d6bb0
notebook for converting multigpu models to single gpu
justinsalamon Mar 1, 2018
78a10ce
Update requirements file
auroracramer Mar 1, 2018
f15b1e0
Update README to refer to TF and Keras installation, remove TF and Ke…
auroracramer Mar 1, 2018
5788322
Add conda environment file
auroracramer Mar 1, 2018
b19a229
Refactor structure a bit
auroracramer Mar 1, 2018
3387a3c
Re-order args
justinsalamon Mar 1, 2018
bc47463
Ignore pyc
justinsalamon Mar 1, 2018
b9c473b
Add init to make life easier inside notebooks
justinsalamon Mar 1, 2018
acbc834
Test loading models after removing multigpu
justinsalamon Mar 1, 2018
c8257ac
Give audio and vision embedding model dicts unique names
justinsalamon Mar 1, 2018
39843b5
udpate notebook
justinsalamon Mar 1, 2018
27d2b56
Implement and use convert_audio_model_to_embedding
justinsalamon Mar 1, 2018
5109bce
Merge branch 'master' of https://github.com/marl/l3embedding
auroracramer Mar 1, 2018
f3a8800
Separate out data ganeration of embeddings for classifier model; add …
auroracramer Mar 2, 2018
94597f0
Create output folder before saving config file
auroracramer Mar 2, 2018
bcc5726
Improve logging, refactor feature extraction, fix various bugs, make …
auroracramer Mar 3, 2018
d1c04df
Change num_gpu to src_num_gpu when calling load_model
auroracramer Mar 6, 2018
b705735
Add sbatch script for generating urban train data for training embedding
auroracramer Mar 8, 2018
ae68727
Add ability to include children of nodes in AudioSet filtering and ad…
auroracramer Mar 8, 2018
0431e37
Add audioset filter files
auroracramer Mar 8, 2018
6615873
Save train configuration parameters to model dir
auroracramer Mar 9, 2018
cb330d9
Properly call super constructor in GSheetLogger constructor
auroracramer Mar 13, 2018
a037f4e
Update classifier training code to work with new feature format for a…
auroracramer Mar 13, 2018
0e4deaa
Set SGDClassifier to use all available CPUs
auroracramer Mar 13, 2018
c402e26
Ignore config file when loading folds, use older version of pescador …
auroracramer Mar 13, 2018
b5cfda2
close reader after use
Mar 13, 2018
121fa1e
Separate pescador parameters for train and validation sets
auroracramer Mar 13, 2018
33c25af
Implement VGGish feature extraction
auroracramer Mar 13, 2018
e5e07eb
Improve output directory convention for classifier data generation
auroracramer Mar 13, 2018
598f10f
Fix relative imports for vggish things
auroracramer Mar 13, 2018
91ddb6e
Adding missing keyword arguments to vggish stuff
auroracramer Mar 13, 2018
a0f3e87
Write each embedding generation config for each fold to a different file
auroracramer Mar 13, 2018
9b90d6a
Fix transpose issues with L3 feature generation, fix classifier test …
auroracramer Mar 14, 2018
e1c4a94
Fix VGGish feature extraction, fix SVM bugs, fix MLP bugs, and add SV…
auroracramer Mar 15, 2018
15e33a0
Fix obtaining classes from test predictions with MLP classifier and a…
auroracramer Mar 15, 2018
bacbd17
Make VGGish features quantized by default, remove some statistics fro…
auroracramer Mar 20, 2018
cb0e2d3
Print configuration in feature generation script
auroracramer Mar 20, 2018
2e6e55f
Add melspec2 env dataset training job
auroracramer Mar 20, 2018
b6f104d
Update embedding sampling script
auroracramer Mar 20, 2018
1cda5bb
Update feature summary statistics computation to be consistent with o…
auroracramer Mar 20, 2018
58946d0
Rename fold loading function, make feature file idxs a numpy array, a…
auroracramer Mar 21, 2018
ebfbf10
Load full dataset into memory for classifier training, do offline nor…
auroracramer Mar 21, 2018
f46ce3c
Remove label format for classification data and instead assume intege…
auroracramer Mar 21, 2018
fba70e0
Change embedding sampling to only output framewise features and updat…
auroracramer Mar 21, 2018
92e3f7b
Fix bug where hop size is not changed for computing framewise L3 feat…
auroracramer Mar 21, 2018
9d26665
Remove unnecessary imports from classifier training code
auroracramer Mar 21, 2018
87b9461
Remove reg_penalty from gsheets classifier update
auroracramer Mar 21, 2018
458bf7d
Fix classifier training typos and add logging for where classifier tr…
auroracramer Mar 21, 2018
894e54e
Remove now unnecessary streamer and mux rate arguments
auroracramer Mar 22, 2018
4d0a1e7
Fix typo not properly updating file_idxs when computing summary stats…
auroracramer Mar 22, 2018
26b1109
Remove num streamers and mux rate from classifier training sbatch script
auroracramer Mar 22, 2018
63aded4
Add parameters for use_min_max and non_overlap
Mar 27, 2018
7f77e4f
Move non-overlap sampling for classifier training outside of stats so…
auroracramer Mar 27, 2018
1f569d1
Merge pull request #31 from marl/parametrize
auroracramer Mar 27, 2018
83973eb
Remove valid batch size from classifier code
auroracramer Mar 27, 2018
11f8ddb
Add non-overlap to google sheets for classification results
auroracramer Mar 27, 2018
5de2eac
For classifier, remove train epoch size, change model id just be mode…
auroracramer Apr 3, 2018
8797329
Change expected input feature directory convention for classifier and…
auroracramer Apr 4, 2018
533f3d8
Change classifier model_type to be last part of model id
auroracramer Apr 4, 2018
39b566a
Remove lingering references to model_id_suffix
auroracramer Apr 4, 2018
d7a462a
Replace example features and output directories in classifier trainin…
auroracramer Apr 4, 2018
0b778d4
Remove "features" from classifier model id
auroracramer Apr 4, 2018
bb96353
Change embedding generation to fit new output directory convention, a…
auroracramer Apr 5, 2018
7d90d7e
Remove embedding model ID from embedding training code and derive fro…
auroracramer Apr 5, 2018
04fa4d7
Add code to process dcase2013 and esc50
Apr 5, 2018
f9b0f6d
wrong function name
Apr 5, 2018
80244d9
Merge pull request #39 from marl/dcase_esc
auroracramer Apr 6, 2018
9f0ae6a
Remove unnecessary model ID argument from feature generation code
auroracramer Apr 6, 2018
07fdc4d
Make US8k metadata path an optional argument and update sbatch script
auroracramer Apr 6, 2018
35c86d8
Add "features" to embedding generation output path
auroracramer Apr 6, 2018
db4ffaf
Add sbatch scripts for ESC-50 and DCASE2013 embedding generation
auroracramer Apr 6, 2018
29407de
Fix typo
auroracramer Apr 7, 2018
a3b5eba
Support training classifier on all datasets
auroracramer Apr 10, 2018
5515ecc
Add cross validation, sans gsheets update
auroracramer Apr 14, 2018
2556a7c
Remove full kfold validation for parameter search, instead use a sing…
auroracramer Apr 15, 2018
c7d66d1
Update search parameter in spreadsheet
auroracramer Apr 15, 2018
c4bf79c
Allow for multiple parameters to be used in paramter search, remove u…
auroracramer Apr 16, 2018
11bab98
Allow user to choose whether to use a fold or subset of training set …
auroracramer Apr 19, 2018
66941bc
Fix issues with shuffling data and splitting train set
auroracramer Apr 19, 2018
ed92691
Add some more fixes to parameter search code
auroracramer Apr 19, 2018
1d73c81
Merge pull request #40 from marl/crossvalidation
auroracramer Apr 20, 2018
4d85c2b
Update valid metrics instead of overwriting
auroracramer Apr 23, 2018
995aa4a
Merge branch 'crossvalidation'
auroracramer Apr 24, 2018
261fbbb
Add random forest and fix metrics for class-wise classification metrics
auroracramer Apr 24, 2018
f0fe9e7
Expose option for retraining with validation in command line arguments
auroracramer Apr 24, 2018
67b95a5
Negate command line option for retraining with validation fold
auroracramer Apr 24, 2018
141d192
Add param search train with valid to spreadsheet, fix spreadsheet iss…
auroracramer Apr 24, 2018
b04fed2
Use best results instead of retrain when not retraining with validati…
auroracramer Apr 24, 2018
ae2b517
Update classifier training script
auroracramer Apr 26, 2018
8d265f5
Fix file idxs for us8k
auroracramer May 1, 2018
154ef20
Change to verbose=2 for mlp.fit() to avoid giant slurm output files
justinsalamon May 4, 2018
f24fc84
DCASE 2013 results (data and notebook)
justinsalamon May 17, 2018
711ba4e
Add support for augmented data and remove setting of random seed
auroracramer Jun 3, 2018
53314df
Merge branch 'master' of https://github.com/marl/l3embedding
auroracramer Jun 3, 2018
373cfca
Add random time delay befor mkdirs to avoid parallel jobs colliding
justinsalamon Jun 5, 2018
7b1b0f1
Add scripts for creating plots and significance tests
auroracramer Jun 17, 2018
f63b6b8
Merge branch 'master' of https://github.com/marl/l3embedding
auroracramer Jun 20, 2018
3349911
Add plotting and significance test script
auroracramer Jul 12, 2018
92c5f71
Rename plotting and sig test script to be more helpful
auroracramer Jul 12, 2018
0f811d1
Add some fixes to the significance tests and plots
auroracramer Jul 17, 2018
eab2a8a
Add epoch experiment plots to plotting script
auroracramer Jul 18, 2018
2f3352c
Update plot generation script
auroracramer Aug 29, 2018
af9a5b5
Add missing sample rate argument to Melspectrogram constructor
auroracramer Aug 29, 2018
a3ce8c8
Merge branch 'master' into review
auroracramer Aug 29, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions 01_create_subsets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
import argparse
import logging
import os
from csv import DictWriter

from data.avc.subsets import get_subset_split
from log import init_console_logger

LOGGER = logging.getLogger('data')
LOGGER.setLevel(logging.DEBUG)


def write_subset_file(path, subset_list):
with open(path, 'w') as f:
field_names = list(subset_list[0].keys())
writer = DictWriter(f, field_names)
writer.writeheader()

for item in subset_list:
item = dict(item)
item['labels'] = ';'.join(item['labels'])
writer.writerow(item)


def parse_arguments():
parser = argparse.ArgumentParser(description='Creates CSVs containing a train-valid-test split for the given dataset')

parser.add_argument('-vr',
'--valid-ratio',
dest='valid_ratio',
action='store',
type=float,
default=0.1,
help='Ratio of dataset used for validation set')

parser.add_argument('-tr',
'--test-ratio',
dest='test_ratio',
action='store',
type=float,
default=0.1,
help='Ratio of dataset used for test set')

parser.add_argument('-rs',
'--random-seed',
dest='random_seed',
action='store',
type=int,
default=12345678,
help='Random seed used for generating split')

parser.add_argument('-o',
'--ontology-path',
dest='ontology_path',
action='store',
type=str,
default=os.path.join(os.path.dirname(__file__), 'resources/ontology.json'),
help='Path to AudioSet ontology')

parser.add_argument('-mp',
'--metadata-path',
dest='metadata_path',
action='store',
type=str,
help='Path to metadata csv file(s). Accepts a glob string.')

parser.add_argument('-fp',
'--filter-path',
dest='filter_path',
action='store',
type=str,
help='Path to filter csv file(s).')


parser.add_argument('-r',
'--random-state',
dest='random_state',
action='store',
type=int,
default=20171021,
help='Random seed used to set the RNG state')

parser.add_argument('data_dir',
action='store',
type=str,
help='Path to directory where data files are stored')

parser.add_argument('output_dir',
action='store',
type=str,
help='Path to directory where output files will be stored')

parser.add_argument('filename_prefix',
action='store',
type=str,
help='Path to directory where output files will be stored')

return parser.parse_args()


if __name__ == '__main__':
init_console_logger(LOGGER, verbose=True)

args = parse_arguments()

train_list, valid_list, test_list \
= get_subset_split(args.data_dir,
valid_ratio=args.valid_ratio,
test_ratio=args.test_ratio,
random_state=args.random_state,
metadata_path=args.metadata_path,
filter_path=args.filter_path,
ontology_path=args.ontology_path)

output_dir = args.output_dir
filename_prefix = args.filename_prefix
train_subset_path = os.path.join(output_dir, filename_prefix + '_train.csv')
valid_subset_path = os.path.join(output_dir, filename_prefix + '_valid.csv')
test_subset_path = os.path.join(output_dir, filename_prefix + '_test.csv')

if not os.path.isdir(output_dir):
os.makedirs(output_dir)

write_subset_file(train_subset_path, train_list)
write_subset_file(valid_subset_path, valid_list)
write_subset_file(test_subset_path, test_list)
143 changes: 143 additions & 0 deletions 02_generate_samples.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
import argparse
import logging
import math
from functools import partial

import multiprocessing_logging

from data.avc.sample import sample_and_save
from data.utils import map_iterate_in_parallel
from log import init_console_logger

LOGGER = logging.getLogger('sampling')
LOGGER.setLevel(logging.DEBUG)

if __name__ == '__main__':

parser = argparse.ArgumentParser(description='Pre-sample videos and audios for L3 model.')
parser.add_argument('-bs',
'--batch-size',
dest='batch_size',
action='store',
type=int,
default=64,
help='Number of examples per training batch')

parser.add_argument('-ns',
'--num-streamers',
dest='num_streamers',
action='store',
type=int,
default=64,
help='Number of training pescador streamers that can be open concurrently')

parser.add_argument('-mr',
'--mux-rate',
dest='mux_rate',
action='store',
type=float,
default=2.0,
help='Poisson distribution parameter for determining number of training samples to take from a streamer')

parser.add_argument('-a',
'--augment',
dest='augment',
action='store_true',
default=False,
help='If True, performs data augmentation on audio and images')

parser.add_argument('-pc',
'--precompute',
dest='precompute',
action='store_true',
default=False,
help='If True, streamer precompute samples')

parser.add_argument('-nd',
'--num-distractors',
dest='num_distractors',
action='store',
type=int,
default=1,
help='Number of distractors for generating examples')

parser.add_argument('-im',
'--include-metadata',
dest='include_metadata',
action='store_true',
help='If True, includes additional metadata in h5 files')

parser.add_argument('-mv',
'--max-videos',
dest='max_videos',
action='store',
type=int,
help='Maximum number of videos to use for generating examples. If not specified, all videos will be used')

parser.add_argument('-r',
'--random-state',
dest='random_state',
action='store',
type=int,
default=20171021,
help='Random seed used to set the RNG state')

parser.add_argument('-n',
'--num-workers',
dest='num_workers',
action='store',
type=int,
default=4,
help='Number of multiprocessing workers used to download videos')

parser.add_argument('-v',
'--verbose',
dest='verbose',
action='store_true',
default=False,
help='Logs verbose info')


parser.add_argument('subset_path',
action='store',
type=str,
help='Path to subset file')

parser.add_argument('num_samples',
action='store',
type=int,
help='(Minimum) number of samples to generate')

parser.add_argument('output_dir',
action='store',
type=str,
help='Path to directory where output files will be stored')

args = parser.parse_args()

init_console_logger(LOGGER, verbose=args.verbose)
multiprocessing_logging.install_mp_handler()

# Just round up for now
num_workers = args.num_workers
batch_size = args.batch_size
batches_per_worker = int(math.ceil(args.num_samples / (num_workers * batch_size)))

worker_func = partial(sample_and_save,
subset_path=args.subset_path,
num_batches=batches_per_worker,
output_dir=args.output_dir,
num_streamers=args.num_streamers,
batch_size=batch_size,
random_state=args.random_state,
precompute=args.precompute,
num_distractors=args.num_distractors,
augment=args.augment,
rate=args.mux_rate,
max_videos=args.max_videos,
include_metadata=args.include_metadata)

map_iterate_in_parallel(range(num_workers), worker_func,
processes=num_workers)

LOGGER.info('Done!')
Loading