-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train it on another data set? how can I handle checkpoint? #18
Comments
have you figured out how it works? I trained on my own dataset as well, but the accuracy is so low.. |
heollo, @changlinzhang @kwotsin could you tell me how to use the files in the checkpoint folder as the pretrain model to train my own dataset? |
hello,everyone,so how to make our data set to train? Thank you. |
I made my own dataset, but I met errors below Traceback (most recent call last): Caused by op u'mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert', defined at: InvalidArgumentError (see above for traceback): assertion failed: [ Could you help me please |
I faced same problem. |
@RobinHan24 I met the same problem.I have 10 classes,according my classes,I set the pixels of my label images to 0 to 9,then the problem fixed.I don't wither it is helpful for you? |
thanks for this useful repo
InvalidArgumentError (see above for traceback): assertion failed: [ as i am beginner to this field so, hoping for suggestions to resolve this error. |
Hi, kwotsin! Thanks for your work.
I want to train it on another data set (class number is 30 instead of 12). I thought I had changed related codes. But I met this error:
2018-01-11 17:23:22.187077: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000
I thought it may be caused by checkpoint? How can I deal with this problem?
The completed information is as follow:
========= Median Frequency Balancing Class Weights =========
[6.397542327061094e-05, 6.7097626201794152e-05, 0.024400273767542283, 0.041269401614453756, 5.5506352412896832e-05, 0.076635711324892844, 0.069381256179271614, 3.472654196521944e-05, 0.00042760164428717635, 0.00012440287198120186, 0.090233329139976615, 0.12489918060211183, 0.0013708685331902757, 6.0827765291491662e-05, 0.073240128809290553, 0.35775514055273316, 0.64257341685305103, 0.90968868010977944, 0.37688909228806228, 0.44248634385452756, 0.00042529101230680852, 0.30566376891079095, 0.28941152643298945, 3.9464190165066867e-05, 0.26421036878629223, 0.42250536299160169, 0.5089356784417215, 0.00024742224929701886, 0.47265314480960613, 0.0]
2018-01-11 17:22:23.528595: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:23.528689: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:23.528720: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 17:22:29.254935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:02:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2018-01-11 17:22:29.503633: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x1e106f80 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2018-01-11 17:22:29.504523: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-11 17:22:29.505315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2018-01-11 17:22:29.505448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2018-01-11 17:22:29.505491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2018-01-11 17:22:29.505540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2018-01-11 17:22:29.505685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2018-01-11 17:22:29.505705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2018-01-11 17:22:29.505740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0)
2018-01-11 17:22:29.505779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:84:00.0)
2018-01-11 17:22:34.391659: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1368 get requests, put_count=1100 evicted_count=1000 eviction_rate=0.909091 and unsatisfied allocation rate=1
2018-01-11 17:22:34.391731: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Starting standard services.
INFO:tensorflow:Starting queue runners.
INFO:tensorflow:Saving checkpoint to path ./log/original/model.ckpt
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Epoch 1/300
INFO:tensorflow:Current Learning Rate: [0.00050000002]
INFO:tensorflow:global step 1: loss: 0.3121 (4.79 sec/step) Current Streaming Accuracy: 0.0000 Current Mean IOU: 0.0000
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0000 Validation Mean IOU: 0.0000 (2.24 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0209 Validation Mean IOU: 0.0030 (1.10 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0207 Validation Mean IOU: 0.0028 (1.26 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0227 Validation Mean IOU: 0.0033 (1.23 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0220 Validation Mean IOU: 0.0035 (1.24 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0208 Validation Mean IOU: 0.0033 (1.28 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0201 Validation Mean IOU: 0.0033 (1.22 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0198 Validation Mean IOU: 0.0032 (1.25 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0032 (1.24 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.21 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0031 (1.18 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.21 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0031 (1.39 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0032 (1.23 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0196 Validation Mean IOU: 0.0032 (1.18 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0193 Validation Mean IOU: 0.0032 (1.16 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0191 Validation Mean IOU: 0.0031 (1.41 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0193 Validation Mean IOU: 0.0031 (1.26 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0195 Validation Mean IOU: 0.0032 (1.43 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0197 Validation Mean IOU: 0.0032 (1.32 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0202 Validation Mean IOU: 0.0033 (1.34 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0204 Validation Mean IOU: 0.0034 (1.33 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0203 Validation Mean IOU: 0.0034 (1.21 sec/step)
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0206 Validation Mean IOU: 0.0034 (1.36 sec/step)
2018-01-11 17:23:21.808311: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: assertion failed: [all dims of 'image.shape' must be > 0.]
[[Node: assert_positive_11/assert_less/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](assert_positive_11/assert_less/All/_5795, assert_positive_11/assert_less/Assert/Assert/data_0)]]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [all dims of 'image.shape' must be > 0.]
[[Node: assert_positive_11/assert_less/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](assert_positive_11/assert_less/All/_5795, assert_positive_11/assert_less/Assert/Assert/data_0)]]
INFO:tensorflow:---VALIDATION--- Validation Accuracy: 0.0207 Validation Mean IOU: 0.0035 (1.19 sec/step)
2018-01-11 17:23:22.187077: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000
[[Node: Reshape_5 = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](batch_1/_5971, Reshape_5/shape)]]
2018-01-11 17:23:22.197319: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Input to reshape is a tensor with 172800 values, but the requested shape has 4320000
[[Node: Reshape_5 = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](batch_1/5971, Reshape_5/shape)]]
Traceback (most recent call last):
File "train_enet.py", line 340, in
run()
File "train_enet.py", line 337, in run
plt.savefig(photo_dir+"/image" + str(i))
File "/usr/lib64/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1063, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/lib64/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [all dims of 'image.shape' must be > 0.]
[[Node: assert_positive_11/assert_less/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](assert_positive_11/assert_less/All/_5795, assert_positive_11/assert_less/Assert/Assert/data_0)]]
The text was updated successfully, but these errors were encountered: