forked from CGCL-codes/naturalcc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathall.log
439 lines (439 loc) · 52 KB
/
all.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
nohup: 忽略输入
[2021-04-12 21:45:25] INFO >> Load arguments in /home/yanghe/Documents/naturalcc-dev/run/retrieval/selfattn/config/csn/all.yml (train.py:302, cli_main())
[2021-04-12 21:45:25] INFO >> {'criterion': 'retrieval_softmax', 'optimizer': 'torch_adam', 'lr_scheduler': 'fixed', 'tokenizer': None, 'bpe': None, 'common': {'no_progress_bar': 0, 'log_interval': 1000, 'log_format': 'simple', 'tensorboard_logdir': '', 'seed': 0, 'cpu': 0, 'fp16': 0, 'memory_efficient_fp16': 0, 'fp16_no_flatten_grads': 0, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'task': 'hybrid_retrieval'}, 'dataset': {'num_workers': 3, 'skip_invalid_size_inputs_valid_test': 0, 'max_tokens': None, 'max_sentences': 1000, 'code_max_tokens': 200, 'query_max_tokens': 30, 'required_batch_size_multiple': 1, 'dataset_impl': 'mmap', 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'fixed_validation_seed': None, 'disable_validation': None, 'max_tokens_valid': None, 'max_sentences_valid': 1000, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'joined_dictionary': 0, 'langs': ['go', 'java', 'javascript', 'ruby', 'python', 'php']}, 'distributed_training': {'distributed_world_size': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': 0, 'ddp_backend': 'c10d', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': None, 'find_unused_parameters': 0, 'fast_stat_sync': 0, 'broadcast_buffers': 0, 'global_sync_iter': 50, 'warmup_iterations': 500, 'local_rank': -1, 'block_momentum': 0.875, 'block_lr': 1, 'use_nbm': 0, 'average_sync': 0}, 'task': {'data': '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all', 'sample_break_mode': 'complete', 'tokens_per_sample': 512, 'mask_prob': 0.15, 'leave_unmasked_prob': 0.1, 'random_token_prob': 0.1, 'freq_weighted_replacement': 0, 'mask_whole_words': 0, 'pooler_activation_fn': 'tanh', 'source_lang': 'code_tokens', 'target_lang': 'docstring_tokens', 'source_aux_lang': 'code_tokens.wo_func', 'target_aux_lang': 'func_name', 'fraction_using_func_name': 0.3, 'load_alignments': 0, 'left_pad_source': 1, 'left_pad_target': 0, 'upsample_primary': 1, 'truncate_source': 0, 'eval_mrr': 1}, 'model': {'arch': 'self_attn', 'code_embed_dim': 128, 'code_token_types': 1, 'code_max_tokens': 200, 'code_position_encoding': 'learned', 'code_dropout': 0.1, 'code_pooling': 'weighted_mean', 'query_embed_dim': 128, 'query_token_types': 1, 'query_max_tokens': 30, 'query_self_attn_layers': 3, 'query_pooling': 'weighted_mean', 'query_position_encoding': 'learned', 'query_dropout': 0.1, 'self_attn_layers': 3, 'attention_heads': 8, 'ffn_embed_dim': 512, 'activation_fn': 'gelu'}, 'optimization': {'max_epoch': 300, 'max_update': 0, 'clip_norm': 1, 'sentence_avg': 0, 'update_freq': [1], 'lrs': [0.0005], 'min_lr': -1, 'use_bmuf': 0, 'lr_shrink': 1.0, 'force_anneal': 0, 'warmup_updates': 0, 'end_learning_rate': 0.0, 'power': 1.0, 'total_num_update': 1000000, 'adam': {'adam_betas': '(0.9, 0.999)', 'adam_eps': 1e-08, 'weight_decay': 0, 'use_old_adam': 1}, 'margin': 1, 'clip_norm_version': 'tf_clip_by_global_norm'}, 'checkpoint': {'save_dir': '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints', 'restore_file': 'checkpoint_last.pt', 'reset_dataloader': None, 'reset_lr_scheduler': None, 'reset_meters': None, 'reset_optimizer': None, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': 0, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': 0, 'no_epoch_checkpoints': 1, 'no_last_checkpoints': 0, 'no_save_optimizer_state': None, 'best_checkpoint_metric': 'mrr', 'maximize_best_checkpoint_metric': 1, 'patience': 10}, 'eval': {'path': '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt', 'quiet': 1, 'max_sentences': 1000, 'model_overrides': '{}', 'eval_size': 1000}} (train.py:304, cli_main())
[2021-04-12 21:45:25] INFO >> single GPU training... (train.py:333, cli_main())
[2021-04-12 21:45:26] INFO >> loaded 89154 examples from: ['/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.go.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.java.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.javascript.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.ruby.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.python.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.php.code_tokens'] (hybrid_retrieval.py:54, load_tokens_dataset())
[2021-04-12 21:45:26] INFO >> loaded 89154 examples from: ['/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.go.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.java.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.javascript.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.ruby.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.python.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/valid.php.docstring_tokens'] (hybrid_retrieval.py:55, load_tokens_dataset())
[2021-04-12 21:45:26] INFO >> SelfAttn(
(src_encoders): ModuleDict(
(go): SelfAttnEncoder(
(embed): Embedding(10000, 128, padding_idx=0)
(type_embed): Embedding(1, 128)
(embed_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
)
(weight_layer): Linear(in_features=128, out_features=1, bias=False)
)
(java): SelfAttnEncoder(
(embed): Embedding(10000, 128, padding_idx=0)
(type_embed): Embedding(1, 128)
(embed_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
)
(weight_layer): Linear(in_features=128, out_features=1, bias=False)
)
(javascript): SelfAttnEncoder(
(embed): Embedding(10000, 128, padding_idx=0)
(type_embed): Embedding(1, 128)
(embed_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
)
(weight_layer): Linear(in_features=128, out_features=1, bias=False)
)
(ruby): SelfAttnEncoder(
(embed): Embedding(10000, 128, padding_idx=0)
(type_embed): Embedding(1, 128)
(embed_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
)
(weight_layer): Linear(in_features=128, out_features=1, bias=False)
)
(python): SelfAttnEncoder(
(embed): Embedding(10000, 128, padding_idx=0)
(type_embed): Embedding(1, 128)
(embed_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
)
(weight_layer): Linear(in_features=128, out_features=1, bias=False)
)
(php): SelfAttnEncoder(
(embed): Embedding(10000, 128, padding_idx=0)
(type_embed): Embedding(1, 128)
(embed_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
)
(weight_layer): Linear(in_features=128, out_features=1, bias=False)
)
)
(tgt_encoders): SelfAttnEncoder(
(embed): Embedding(10000, 128, padding_idx=0)
(type_embed): Embedding(1, 128)
(embed_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=128, out_features=128, bias=True)
(v_proj): Linear(in_features=128, out_features=128, bias=True)
(q_proj): Linear(in_features=128, out_features=128, bias=True)
(out_proj): Linear(in_features=128, out_features=128, bias=True)
)
(self_attn_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=128, bias=True)
(ff_layer_norm): LayerNorm((128,), eps=1e-08, elementwise_affine=True)
)
)
(weight_layer): Linear(in_features=128, out_features=1, bias=False)
)
) (train.py:223, single_main())
[2021-04-12 21:45:26] INFO >> model self_attn, criterion SearchSoftmaxCriterion (train.py:224, single_main())
[2021-04-12 21:45:26] INFO >> num. model params: 13284736 (num. trained: 13284736) (train.py:227, single_main())
[2021-04-12 21:45:27] INFO >> training on 1 GPUs (train.py:233, single_main())
[2021-04-12 21:45:27] INFO >> max tokens per GPU = None and max sentences per GPU = 1000 (train.py:236, single_main())
[2021-04-12 21:45:27] INFO >> no existing checkpoint found /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (ncc_trainers.py:270, load_checkpoint())
[2021-04-12 21:45:27] INFO >> loading train data for epoch 1 (ncc_trainers.py:285, get_train_iterator())
[2021-04-12 21:45:28] INFO >> loaded 1880853 examples from: ['/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.go.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.java.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.javascript.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.ruby.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.python.code_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.php.code_tokens'] (hybrid_retrieval.py:54, load_tokens_dataset())
[2021-04-12 21:45:28] INFO >> loaded 1880853 examples from: ['/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.go.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.java.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.javascript.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.ruby.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.python.docstring_tokens', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.php.docstring_tokens'] (hybrid_retrieval.py:55, load_tokens_dataset())
[2021-04-12 21:45:28] INFO >> loaded 1880853 examples from: ['/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.go.code_tokens.wo_func', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.java.code_tokens.wo_func', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.javascript.code_tokens.wo_func', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.ruby.code_tokens.wo_func', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.python.code_tokens.wo_func', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.php.code_tokens.wo_func'] (hybrid_retrieval.py:67, load_tokens_dataset())
[2021-04-12 21:45:28] INFO >> loaded 1880853 examples from: ['/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.go.func_name', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.java.func_name', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.javascript.func_name', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.ruby.func_name', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.python.func_name', '/data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/train.php.func_name'] (hybrid_retrieval.py:81, load_tokens_dataset())
[2021-04-12 21:45:29] INFO >> NOTE: your device may support faster training with fp16 (ncc_trainers.py:155, _setup_optimizer())
[2021-04-12 21:54:25] INFO >> epoch 001: 1000 / 1881 loss=3.045, mrr=520.748, sample_size=1000, wps=1916.9, ups=1.92, wpb=1000, bsz=1000, num_updates=1000, lr=0.0005, gnorm=8.519, clip=100, train_wall=516, wall=538 (progress_bar.py:262, log())
[2021-04-12 22:02:04] INFO >> epoch 001 | loss 2.306 | mrr 0.630492 | sample_size 1000 | wps 1916.9 | ups 1.92 | wpb 1000 | bsz 1000 | num_updates 1880 | lr 0.0005 | gnorm 6.845 | clip 100 | train_wall 969 | wall 997 (progress_bar.py:269, print())
[2021-04-12 22:02:33] INFO >> epoch 001 | valid on 'valid' subset | loss 1.902 | mrr 0.694252 | sample_size 1000 | wps 6624.2 | wpb 1000 | bsz 1000 | num_updates 1880 (progress_bar.py:269, print())
[2021-04-12 22:02:34] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 1 @ 1880 updates, score 0.694252) (writing took 0.851679 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-12 22:03:49] INFO >> epoch 002: 120 / 1881 loss=1.422, mrr=762.241, sample_size=1000, wps=1771.5, ups=1.77, wpb=1000, bsz=1000, num_updates=2000, lr=0.0005, gnorm=4.862, clip=100, train_wall=515, wall=1102 (progress_bar.py:262, log())
[2021-04-12 22:12:33] INFO >> epoch 002: 1121 / 1881 loss=1.091, mrr=815.017, sample_size=1000, wps=1910.5, ups=1.91, wpb=1000, bsz=1000, num_updates=3000, lr=0.0005, gnorm=3.992, clip=100, train_wall=517, wall=1625 (progress_bar.py:262, log())
[2021-04-12 22:19:09] INFO >> epoch 002 | loss 1.068 | mrr 0.818905 | sample_size 1000 | wps 1833.5 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 3760 | lr 0.0005 | gnorm 3.826 | clip 100 | train_wall 970 | wall 2022 (progress_bar.py:269, print())
[2021-04-12 22:19:38] INFO >> epoch 002 | valid on 'valid' subset | loss 1.669 | mrr 0.729664 | sample_size 1000 | wps 6617 | wpb 1000 | bsz 1000 | num_updates 3760 | best_mrr 729.664 (progress_bar.py:269, print())
[2021-04-12 22:19:39] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 2 @ 3760 updates, score 0.729664) (writing took 0.835159 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-12 22:21:58] INFO >> epoch 003: 240 / 1881 loss=0.986, mrr=832.068, sample_size=1000, wps=1769.8, ups=1.77, wpb=1000, bsz=1000, num_updates=4000, lr=0.0005, gnorm=3.468, clip=100, train_wall=515, wall=2190 (progress_bar.py:262, log())
[2021-04-12 22:30:41] INFO >> epoch 003: 1240 / 1881 loss=0.855, mrr=852.867, sample_size=1000, wps=1911.9, ups=1.91, wpb=1000, bsz=1000, num_updates=5000, lr=0.0005, gnorm=3.157, clip=100, train_wall=517, wall=2713 (progress_bar.py:262, log())
[2021-04-12 22:36:14] INFO >> epoch 003 | loss 0.852 | mrr 0.853251 | sample_size 1000 | wps 1833.5 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 5640 | lr 0.0005 | gnorm 3.105 | clip 100 | train_wall 970 | wall 3047 (progress_bar.py:269, print())
[2021-04-12 22:36:44] INFO >> epoch 003 | valid on 'valid' subset | loss 1.601 | mrr 0.743867 | sample_size 1000 | wps 6616.6 | wpb 1000 | bsz 1000 | num_updates 5640 | best_mrr 743.867 (progress_bar.py:269, print())
[2021-04-12 22:36:45] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 3 @ 5640 updates, score 0.743867) (writing took 1.079027 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-12 22:40:06] INFO >> epoch 004: 360 / 1881 loss=0.799, mrr=861.977, sample_size=1000, wps=1769.4, ups=1.77, wpb=1000, bsz=1000, num_updates=6000, lr=0.0005, gnorm=2.903, clip=100, train_wall=515, wall=3279 (progress_bar.py:262, log())
[2021-04-12 22:48:48] INFO >> epoch 004: 1361 / 1881 loss=0.736, mrr=871.731, sample_size=1000, wps=1914.1, ups=1.91, wpb=1000, bsz=1000, num_updates=7000, lr=0.0005, gnorm=2.736, clip=100, train_wall=516, wall=3801 (progress_bar.py:262, log())
[2021-04-12 22:53:19] INFO >> epoch 004 | loss 0.735 | mrr 0.872047 | sample_size 1000 | wps 1834.3 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 7520 | lr 0.0005 | gnorm 2.719 | clip 100 | train_wall 970 | wall 4072 (progress_bar.py:269, print())
[2021-04-12 22:53:48] INFO >> epoch 004 | valid on 'valid' subset | loss 1.583 | mrr 0.748769 | sample_size 1000 | wps 6614.8 | wpb 1000 | bsz 1000 | num_updates 7520 | best_mrr 748.769 (progress_bar.py:269, print())
[2021-04-12 22:53:49] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 4 @ 7520 updates, score 0.748769) (writing took 0.810725 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-12 22:58:13] INFO >> epoch 005: 481 / 1881 loss=0.694, mrr=878.616, sample_size=1000, wps=1770.2, ups=1.77, wpb=1000, bsz=1000, num_updates=8000, lr=0.0005, gnorm=2.568, clip=100, train_wall=515, wall=4366 (progress_bar.py:262, log())
[2021-04-12 23:06:55] INFO >> epoch 005: 1481 / 1881 loss=0.661, mrr=883.756, sample_size=1000, wps=1915.8, ups=1.92, wpb=1000, bsz=1000, num_updates=9000, lr=0.0005, gnorm=2.471, clip=100, train_wall=516, wall=4888 (progress_bar.py:262, log())
[2021-04-12 23:10:24] INFO >> epoch 005 | loss 0.657 | mrr 0.884533 | sample_size 1000 | wps 1835.5 | ups 1.84 | wpb 1000 | bsz 1000 | num_updates 9400 | lr 0.0005 | gnorm 2.467 | clip 100 | train_wall 969 | wall 5096 (progress_bar.py:269, print())
[2021-04-12 23:10:53] INFO >> epoch 005 | valid on 'valid' subset | loss 1.587 | mrr 0.752088 | sample_size 1000 | wps 6626 | wpb 1000 | bsz 1000 | num_updates 9400 | best_mrr 752.088 (progress_bar.py:269, print())
[2021-04-12 23:10:54] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 5 @ 9400 updates, score 0.752088) (writing took 0.877400 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-12 23:16:21] INFO >> epoch 006: 600 / 1881 loss=0.617, mrr=891.094, sample_size=1000, wps=1765.7, ups=1.77, wpb=1000, bsz=1000, num_updates=10000, lr=0.0005, gnorm=2.348, clip=100, train_wall=516, wall=5454 (progress_bar.py:262, log())
[2021-04-12 23:25:03] INFO >> epoch 006: 1601 / 1881 loss=0.608, mrr=892.155, sample_size=1000, wps=1917.6, ups=1.92, wpb=1000, bsz=1000, num_updates=11000, lr=0.0005, gnorm=2.297, clip=100, train_wall=515, wall=5976 (progress_bar.py:262, log())
[2021-04-12 23:27:29] INFO >> epoch 006 | loss 0.6 | mrr 0.893612 | sample_size 1000 | wps 1833.2 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 11280 | lr 0.0005 | gnorm 2.291 | clip 100 | train_wall 970 | wall 6122 (progress_bar.py:269, print())
[2021-04-12 23:27:58] INFO >> epoch 006 | valid on 'valid' subset | loss 1.62 | mrr 0.754326 | sample_size 1000 | wps 6626.5 | wpb 1000 | bsz 1000 | num_updates 11280 | best_mrr 754.326 (progress_bar.py:269, print())
[2021-04-12 23:27:59] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 6 @ 11280 updates, score 0.754326) (writing took 0.847549 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-12 23:34:29] INFO >> epoch 007: 721 / 1881 loss=0.56, mrr=900.151, sample_size=1000, wps=1767.4, ups=1.77, wpb=1000, bsz=1000, num_updates=12000, lr=0.0005, gnorm=2.185, clip=100, train_wall=516, wall=6542 (progress_bar.py:262, log())
[2021-04-12 23:43:10] INFO >> epoch 007: 1721 / 1881 loss=0.57, mrr=898.41, sample_size=1000, wps=1918, ups=1.92, wpb=1000, bsz=1000, num_updates=13000, lr=0.0005, gnorm=2.163, clip=100, train_wall=515, wall=7063 (progress_bar.py:262, log())
[2021-04-12 23:44:34] INFO >> epoch 007 | loss 0.559 | mrr 0.900365 | sample_size 1000 | wps 1834.9 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 13160 | lr 0.0005 | gnorm 2.158 | clip 100 | train_wall 970 | wall 7147 (progress_bar.py:269, print())
[2021-04-12 23:45:03] INFO >> epoch 007 | valid on 'valid' subset | loss 1.644 | mrr 0.753921 | sample_size 1000 | wps 6596.5 | wpb 1000 | bsz 1000 | num_updates 13160 | best_mrr 753.921 (progress_bar.py:269, print())
[2021-04-12 23:45:03] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 7 @ 13160 updates, score 0.753921) (writing took 0.522516 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-12 23:52:36] INFO >> epoch 008: 840 / 1881 loss=0.516, mrr=907.522, sample_size=1000, wps=1766.8, ups=1.77, wpb=1000, bsz=1000, num_updates=14000, lr=0.0005, gnorm=2.058, clip=100, train_wall=517, wall=7629 (progress_bar.py:262, log())
[2021-04-13 00:01:18] INFO >> epoch 008: 1841 / 1881 loss=0.54, mrr=903.265, sample_size=1000, wps=1917.3, ups=1.92, wpb=1000, bsz=1000, num_updates=15000, lr=0.0005, gnorm=2.055, clip=100, train_wall=515, wall=8151 (progress_bar.py:262, log())
[2021-04-13 00:01:39] INFO >> epoch 008 | loss 0.523 | mrr 0.906159 | sample_size 1000 | wps 1834.1 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 15040 | lr 0.0005 | gnorm 2.049 | clip 100 | train_wall 970 | wall 8172 (progress_bar.py:269, print())
[2021-04-13 00:02:08] INFO >> epoch 008 | valid on 'valid' subset | loss 1.694 | mrr 0.752148 | sample_size 1000 | wps 6608.4 | wpb 1000 | bsz 1000 | num_updates 15040 | best_mrr 752.148 (progress_bar.py:269, print())
[2021-04-13 00:02:08] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 8 @ 15040 updates, score 0.752148) (writing took 0.491091 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 00:10:44] INFO >> epoch 009: 961 / 1881 loss=0.481, mrr=913.049, sample_size=1000, wps=1766.5, ups=1.77, wpb=1000, bsz=1000, num_updates=16000, lr=0.0005, gnorm=1.953, clip=100, train_wall=517, wall=8717 (progress_bar.py:262, log())
[2021-04-13 00:18:43] INFO >> epoch 009 | loss 0.494 | mrr 0.910625 | sample_size 1000 | wps 1834.6 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 16920 | lr 0.0005 | gnorm 1.963 | clip 100 | train_wall 970 | wall 9196 (progress_bar.py:269, print())
[2021-04-13 00:19:12] INFO >> epoch 009 | valid on 'valid' subset | loss 1.708 | mrr 0.755262 | sample_size 1000 | wps 6621.7 | wpb 1000 | bsz 1000 | num_updates 16920 | best_mrr 755.262 (progress_bar.py:269, print())
[2021-04-13 00:19:13] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 9 @ 16920 updates, score 0.755262) (writing took 0.890049 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 00:20:08] INFO >> epoch 010: 80 / 1881 loss=0.504, mrr=908.708, sample_size=1000, wps=1773, ups=1.77, wpb=1000, bsz=1000, num_updates=17000, lr=0.0005, gnorm=1.965, clip=100, train_wall=514, wall=9281 (progress_bar.py:262, log())
[2021-04-13 00:28:51] INFO >> epoch 010: 1081 / 1881 loss=0.458, mrr=916.758, sample_size=1000, wps=1909.5, ups=1.91, wpb=1000, bsz=1000, num_updates=18000, lr=0.0005, gnorm=1.886, clip=100, train_wall=518, wall=9804 (progress_bar.py:262, log())
[2021-04-13 00:35:49] INFO >> epoch 010 | loss 0.471 | mrr 0.914488 | sample_size 1000 | wps 1833.7 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 18800 | lr 0.0005 | gnorm 1.893 | clip 100 | train_wall 970 | wall 10222 (progress_bar.py:269, print())
[2021-04-13 00:36:18] INFO >> epoch 010 | valid on 'valid' subset | loss 1.686 | mrr 0.755544 | sample_size 1000 | wps 6610.4 | wpb 1000 | bsz 1000 | num_updates 18800 | best_mrr 755.544 (progress_bar.py:269, print())
[2021-04-13 00:36:19] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 10 @ 18800 updates, score 0.755544) (writing took 1.154205 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 00:38:17] INFO >> epoch 011: 200 / 1881 loss=0.476, mrr=913.511, sample_size=1000, wps=1768.8, ups=1.77, wpb=1000, bsz=1000, num_updates=19000, lr=0.0005, gnorm=1.882, clip=100, train_wall=515, wall=10370 (progress_bar.py:262, log())
[2021-04-13 00:47:00] INFO >> epoch 011: 1200 / 1881 loss=0.444, mrr=918.876, sample_size=1000, wps=1911.5, ups=1.91, wpb=1000, bsz=1000, num_updates=20000, lr=0.0005, gnorm=1.828, clip=100, train_wall=517, wall=10893 (progress_bar.py:262, log())
[2021-04-13 00:52:54] INFO >> epoch 011 | loss 0.451 | mrr 0.917625 | sample_size 1000 | wps 1832.8 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 20680 | lr 0.0005 | gnorm 1.83 | clip 100 | train_wall 970 | wall 11247 (progress_bar.py:269, print())
[2021-04-13 00:53:24] INFO >> epoch 011 | valid on 'valid' subset | loss 1.745 | mrr 0.75492 | sample_size 1000 | wps 6608.7 | wpb 1000 | bsz 1000 | num_updates 20680 | best_mrr 754.92 (progress_bar.py:269, print())
[2021-04-13 00:53:24] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 11 @ 20680 updates, score 0.75492) (writing took 0.534552 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 00:56:25] INFO >> epoch 012: 320 / 1881 loss=0.451, mrr=917.594, sample_size=1000, wps=1770.9, ups=1.77, wpb=1000, bsz=1000, num_updates=21000, lr=0.0005, gnorm=1.813, clip=100, train_wall=515, wall=11458 (progress_bar.py:262, log())
[2021-04-13 01:05:07] INFO >> epoch 012: 1321 / 1881 loss=0.431, mrr=920.952, sample_size=1000, wps=1913.6, ups=1.91, wpb=1000, bsz=1000, num_updates=22000, lr=0.0005, gnorm=1.781, clip=100, train_wall=516, wall=11980 (progress_bar.py:262, log())
[2021-04-13 01:09:59] INFO >> epoch 012 | loss 0.435 | mrr 0.920263 | sample_size 1000 | wps 1835 | ups 1.84 | wpb 1000 | bsz 1000 | num_updates 22560 | lr 0.0005 | gnorm 1.778 | clip 100 | train_wall 970 | wall 12272 (progress_bar.py:269, print())
[2021-04-13 01:10:28] INFO >> epoch 012 | valid on 'valid' subset | loss 1.759 | mrr 0.756046 | sample_size 1000 | wps 6595.1 | wpb 1000 | bsz 1000 | num_updates 22560 | best_mrr 756.046 (progress_bar.py:269, print())
[2021-04-13 01:10:29] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_best.pt (epoch 12 @ 22560 updates, score 0.756046) (writing took 0.844400 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 01:14:32] INFO >> epoch 013: 440 / 1881 loss=0.431, mrr=920.871, sample_size=1000, wps=1770.7, ups=1.77, wpb=1000, bsz=1000, num_updates=23000, lr=0.0005, gnorm=1.75, clip=100, train_wall=515, wall=12545 (progress_bar.py:262, log())
[2021-04-13 01:23:14] INFO >> epoch 013: 1441 / 1881 loss=0.421, mrr=922.321, sample_size=1000, wps=1915.2, ups=1.92, wpb=1000, bsz=1000, num_updates=24000, lr=0.0005, gnorm=1.737, clip=100, train_wall=516, wall=13067 (progress_bar.py:262, log())
[2021-04-13 01:27:04] INFO >> epoch 013 | loss 0.42 | mrr 0.922599 | sample_size 1000 | wps 1835 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 24440 | lr 0.0005 | gnorm 1.728 | clip 100 | train_wall 969 | wall 13296 (progress_bar.py:269, print())
[2021-04-13 01:27:33] INFO >> epoch 013 | valid on 'valid' subset | loss 1.78 | mrr 0.753756 | sample_size 1000 | wps 6613 | wpb 1000 | bsz 1000 | num_updates 24440 | best_mrr 753.756 (progress_bar.py:269, print())
[2021-04-13 01:27:33] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 13 @ 24440 updates, score 0.753756) (writing took 0.517350 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 01:32:39] INFO >> epoch 014: 561 / 1881 loss=0.41, mrr=924.349, sample_size=1000, wps=1769.9, ups=1.77, wpb=1000, bsz=1000, num_updates=25000, lr=0.0005, gnorm=1.69, clip=100, train_wall=516, wall=13632 (progress_bar.py:262, log())
[2021-04-13 01:41:21] INFO >> epoch 014: 1561 / 1881 loss=0.409, mrr=924.46, sample_size=1000, wps=1915.3, ups=1.92, wpb=1000, bsz=1000, num_updates=26000, lr=0.0005, gnorm=1.696, clip=100, train_wall=516, wall=14154 (progress_bar.py:262, log())
[2021-04-13 01:44:08] INFO >> epoch 014 | loss 0.405 | mrr 0.925108 | sample_size 1000 | wps 1834.5 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 26320 | lr 0.0005 | gnorm 1.682 | clip 100 | train_wall 970 | wall 14321 (progress_bar.py:269, print())
[2021-04-13 01:44:37] INFO >> epoch 014 | valid on 'valid' subset | loss 1.828 | mrr 0.752651 | sample_size 1000 | wps 6621.2 | wpb 1000 | bsz 1000 | num_updates 26320 | best_mrr 752.651 (progress_bar.py:269, print())
[2021-04-13 01:44:38] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 14 @ 26320 updates, score 0.752651) (writing took 0.481704 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 01:50:47] INFO >> epoch 015: 681 / 1881 loss=0.392, mrr=927.186, sample_size=1000, wps=1766.9, ups=1.77, wpb=1000, bsz=1000, num_updates=27000, lr=0.0005, gnorm=1.647, clip=100, train_wall=517, wall=14720 (progress_bar.py:262, log())
[2021-04-13 01:59:29] INFO >> epoch 015: 1681 / 1881 loss=0.401, mrr=925.625, sample_size=1000, wps=1917.7, ups=1.92, wpb=1000, bsz=1000, num_updates=28000, lr=0.0005, gnorm=1.663, clip=100, train_wall=515, wall=15241 (progress_bar.py:262, log())
[2021-04-13 02:01:13] INFO >> epoch 015 | loss 0.394 | mrr 0.926876 | sample_size 1000 | wps 1834.5 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 28200 | lr 0.0005 | gnorm 1.65 | clip 100 | train_wall 970 | wall 15346 (progress_bar.py:269, print())
[2021-04-13 02:01:42] INFO >> epoch 015 | valid on 'valid' subset | loss 1.856 | mrr 0.753305 | sample_size 1000 | wps 6624.3 | wpb 1000 | bsz 1000 | num_updates 28200 | best_mrr 753.305 (progress_bar.py:269, print())
[2021-04-13 02:01:43] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 15 @ 28200 updates, score 0.753305) (writing took 0.489298 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 02:08:55] INFO >> epoch 016: 801 / 1881 loss=0.379, mrr=929.45, sample_size=1000, wps=1766.5, ups=1.77, wpb=1000, bsz=1000, num_updates=29000, lr=0.0005, gnorm=1.609, clip=100, train_wall=517, wall=15808 (progress_bar.py:262, log())
[2021-04-13 02:17:36] INFO >> epoch 016: 1801 / 1881 loss=0.395, mrr=926.322, sample_size=1000, wps=1918.3, ups=1.92, wpb=1000, bsz=1000, num_updates=30000, lr=0.0005, gnorm=1.628, clip=100, train_wall=515, wall=16329 (progress_bar.py:262, log())
[2021-04-13 02:18:18] INFO >> epoch 016 | loss 0.384 | mrr 0.928315 | sample_size 1000 | wps 1834.4 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 30080 | lr 0.0005 | gnorm 1.613 | clip 100 | train_wall 970 | wall 16371 (progress_bar.py:269, print())
[2021-04-13 02:18:47] INFO >> epoch 016 | valid on 'valid' subset | loss 1.825 | mrr 0.754056 | sample_size 1000 | wps 6624.8 | wpb 1000 | bsz 1000 | num_updates 30080 | best_mrr 754.056 (progress_bar.py:269, print())
[2021-04-13 02:18:48] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 16 @ 30080 updates, score 0.754056) (writing took 0.490312 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 02:27:02] INFO >> epoch 017: 921 / 1881 loss=0.361, mrr=932.206, sample_size=1000, wps=1765.3, ups=1.77, wpb=1000, bsz=1000, num_updates=31000, lr=0.0005, gnorm=1.566, clip=100, train_wall=517, wall=16895 (progress_bar.py:262, log())
[2021-04-13 02:35:23] INFO >> epoch 017 | loss 0.372 | mrr 0.93029 | sample_size 1000 | wps 1834.7 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 31960 | lr 0.0005 | gnorm 1.58 | clip 100 | train_wall 970 | wall 17396 (progress_bar.py:269, print())
[2021-04-13 02:35:52] INFO >> epoch 017 | valid on 'valid' subset | loss 1.926 | mrr 0.751955 | sample_size 1000 | wps 6614.6 | wpb 1000 | bsz 1000 | num_updates 31960 | best_mrr 751.955 (progress_bar.py:269, print())
[2021-04-13 02:35:52] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 17 @ 31960 updates, score 0.751955) (writing took 0.496455 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 02:36:26] INFO >> epoch 018: 40 / 1881 loss=0.385, mrr=928.134, sample_size=1000, wps=1773.5, ups=1.77, wpb=1000, bsz=1000, num_updates=32000, lr=0.0005, gnorm=1.593, clip=100, train_wall=514, wall=17459 (progress_bar.py:262, log())
[2021-04-13 02:45:10] INFO >> epoch 018: 1041 / 1881 loss=0.352, mrr=933.74, sample_size=1000, wps=1910.8, ups=1.91, wpb=1000, bsz=1000, num_updates=33000, lr=0.0005, gnorm=1.537, clip=100, train_wall=517, wall=17983 (progress_bar.py:262, log())
[2021-04-13 02:52:28] INFO >> epoch 018 | loss 0.364 | mrr 0.931801 | sample_size 1000 | wps 1834.2 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 33840 | lr 0.0005 | gnorm 1.55 | clip 100 | train_wall 970 | wall 18420 (progress_bar.py:269, print())
[2021-04-13 02:52:57] INFO >> epoch 018 | valid on 'valid' subset | loss 1.907 | mrr 0.753193 | sample_size 1000 | wps 6615.9 | wpb 1000 | bsz 1000 | num_updates 33840 | best_mrr 753.193 (progress_bar.py:269, print())
[2021-04-13 02:52:57] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 18 @ 33840 updates, score 0.753193) (writing took 0.481202 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 02:54:33] INFO >> epoch 019: 161 / 1881 loss=0.372, mrr=930.542, sample_size=1000, wps=1773.8, ups=1.77, wpb=1000, bsz=1000, num_updates=34000, lr=0.0005, gnorm=1.553, clip=100, train_wall=515, wall=18546 (progress_bar.py:262, log())
[2021-04-13 03:03:17] INFO >> epoch 019: 1161 / 1881 loss=0.352, mrr=933.693, sample_size=1000, wps=1910.7, ups=1.91, wpb=1000, bsz=1000, num_updates=35000, lr=0.0005, gnorm=1.524, clip=100, train_wall=517, wall=19070 (progress_bar.py:262, log())
[2021-04-13 03:09:32] INFO >> epoch 019 | loss 0.357 | mrr 0.932778 | sample_size 1000 | wps 1834.6 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 35720 | lr 0.0005 | gnorm 1.526 | clip 100 | train_wall 970 | wall 19445 (progress_bar.py:269, print())
[2021-04-13 03:10:02] INFO >> epoch 019 | valid on 'valid' subset | loss 1.947 | mrr 0.753656 | sample_size 1000 | wps 6601 | wpb 1000 | bsz 1000 | num_updates 35720 | best_mrr 753.656 (progress_bar.py:269, print())
[2021-04-13 03:10:02] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 19 @ 35720 updates, score 0.753656) (writing took 0.498985 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 03:12:41] INFO >> epoch 020: 280 / 1881 loss=0.358, mrr=932.557, sample_size=1000, wps=1771.3, ups=1.77, wpb=1000, bsz=1000, num_updates=36000, lr=0.0005, gnorm=1.519, clip=100, train_wall=515, wall=19634 (progress_bar.py:262, log())
[2021-04-13 03:21:25] INFO >> epoch 020: 1281 / 1881 loss=0.347, mrr=934.508, sample_size=1000, wps=1911.6, ups=1.91, wpb=1000, bsz=1000, num_updates=37000, lr=0.0005, gnorm=1.505, clip=100, train_wall=517, wall=20157 (progress_bar.py:262, log())
[2021-04-13 03:26:37] INFO >> epoch 020 | loss 0.35 | mrr 0.933964 | sample_size 1000 | wps 1834.5 | ups 1.83 | wpb 1000 | bsz 1000 | num_updates 37600 | lr 0.0005 | gnorm 1.502 | clip 100 | train_wall 970 | wall 20470 (progress_bar.py:269, print())
[2021-04-13 03:27:06] INFO >> epoch 020 | valid on 'valid' subset | loss 1.935 | mrr 0.752192 | sample_size 1000 | wps 6628.8 | wpb 1000 | bsz 1000 | num_updates 37600 | best_mrr 752.192 (progress_bar.py:269, print())
[2021-04-13 03:27:07] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 20 @ 37600 updates, score 0.752192) (writing took 0.506341 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 03:30:49] INFO >> epoch 021: 400 / 1881 loss=0.349, mrr=933.999, sample_size=1000, wps=1772.4, ups=1.77, wpb=1000, bsz=1000, num_updates=38000, lr=0.0005, gnorm=1.487, clip=99.9, train_wall=515, wall=20722 (progress_bar.py:262, log())
[2021-04-13 03:39:31] INFO >> epoch 021: 1401 / 1881 loss=0.342, mrr=935.152, sample_size=1000, wps=1916.2, ups=1.92, wpb=1000, bsz=1000, num_updates=39000, lr=0.0005, gnorm=1.483, clip=100, train_wall=516, wall=21243 (progress_bar.py:262, log())
[2021-04-13 03:43:41] INFO >> epoch 021 | loss 0.343 | mrr 0.935074 | sample_size 1000 | wps 1836.2 | ups 1.84 | wpb 1000 | bsz 1000 | num_updates 39480 | lr 0.0005 | gnorm 1.48 | clip 99.9 | train_wall 969 | wall 21494 (progress_bar.py:269, print())
[2021-04-13 03:44:10] INFO >> epoch 021 | valid on 'valid' subset | loss 1.966 | mrr 0.75313 | sample_size 1000 | wps 6623.8 | wpb 1000 | bsz 1000 | num_updates 39480 | best_mrr 753.13 (progress_bar.py:269, print())
[2021-04-13 03:44:11] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 21 @ 39480 updates, score 0.75313) (writing took 0.507651 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 03:48:55] INFO >> epoch 022: 520 / 1881 loss=0.34, mrr=935.662, sample_size=1000, wps=1771.5, ups=1.77, wpb=1000, bsz=1000, num_updates=40000, lr=0.0005, gnorm=1.462, clip=100, train_wall=515, wall=21808 (progress_bar.py:262, log())
[2021-04-13 03:57:36] INFO >> epoch 022: 1521 / 1881 loss=0.34, mrr=935.661, sample_size=1000, wps=1919.2, ups=1.92, wpb=1000, bsz=1000, num_updates=41000, lr=0.0005, gnorm=1.465, clip=100, train_wall=515, wall=22329 (progress_bar.py:262, log())
[2021-04-13 04:00:44] INFO >> epoch 022 | loss 0.337 | mrr 0.93603 | sample_size 1000 | wps 1838.7 | ups 1.84 | wpb 1000 | bsz 1000 | num_updates 41360 | lr 0.0005 | gnorm 1.456 | clip 100 | train_wall 968 | wall 22516 (progress_bar.py:269, print())
[2021-04-13 04:01:13] INFO >> epoch 022 | valid on 'valid' subset | loss 1.99 | mrr 0.752699 | sample_size 1000 | wps 6606.3 | wpb 1000 | bsz 1000 | num_updates 41360 | best_mrr 752.699 (progress_bar.py:269, print())
[2021-04-13 04:01:13] INFO >> saved checkpoint /data/yanghe/ncc_data/codesearchnet/retrieval/data-mmap/all/self_attn/checkpoints/checkpoint_last.pt (epoch 22 @ 41360 updates, score 0.752699) (writing took 0.507595 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-04-13 04:01:13] INFO >> early stop since valid performance hasn't improved for last 10 runs (train.py:191, should_stop_early())
[2021-04-13 04:01:13] INFO >> early stop since valid performance hasn't improved for last 10 runs (train.py:272, single_main())
[2021-04-13 04:01:13] INFO >> done training in 22543.8 seconds (train.py:283, single_main())