Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with executing conv_net_train.py #32

Open
CyraxSector opened this issue Mar 27, 2020 · 33 comments
Open

Issue with executing conv_net_train.py #32

CyraxSector opened this issue Mar 27, 2020 · 33 comments

Comments

@CyraxSector
Copy link

CyraxSector commented Mar 27, 2020

When I run the conv_net_train.py, the process is holding and the Terminal output is as follows.

('loading data...',)
data loaded!
model architecture: CNN-static
using: word2vec vectors
[('image shape', 153, 300), ('filter shape', [(200, 1, 1, 300), (200, 1, 2, 300), (200, 1, 3, 300)]), ('hidden_units', [200, 200, 2]), ('dropout', [0.5, 0.5, 0.5]), ('batch_size', 50), ('non_static', False), ('learn_decay', 0.95), ('conv_non_linear', 'relu'), ('non
_static', False), ('sqr_norm_lim', 9), ('shuffle_batch', True)]
... training

Your ideas are highly appreciated. @amirmohammadkz

Thanks.

@amirmohammadkz
Copy link

Hmm... That's strange. Are you using my version or the original version?
can you trace the CPU/GPU usage from task manager (windows), htop command (Linux)?
Can you run this python file in a debug mode using pycharm to trace the exact line that makes this issue?

@CyraxSector
Copy link
Author

Thanks a lot for your response. Actually because of this issue, I setup your version by installing all the respective modules. When I run the conv_net_train_keras.py in pycharm, I got the following error. I have set up the CUDA on my GPU as well for executing this.

2020-03-30 05:09:51.532398: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 2 Chunks of size 512 totalling 1.0KiB
2020-03-30 05:09:51.532769: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 11 Chunks of size 1024 totalling 11.0KiB
2020-03-30 05:09:51.533163: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 1280 totalling 1.3KiB
2020-03-30 05:09:51.533543: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 1536 totalling 1.5KiB
2020-03-30 05:09:51.533929: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 3 Chunks of size 1792 totalling 5.3KiB
2020-03-30 05:09:51.534307: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 2 Chunks of size 240128 totalling 469.0KiB
2020-03-30 05:09:51.534705: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 470784 totalling 459.8KiB
2020-03-30 05:09:51.535114: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 2 Chunks of size 480000 totalling 937.5KiB
2020-03-30 05:09:51.535516: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 3 Chunks of size 547328 totalling 1.57MiB
2020-03-30 05:09:51.535916: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 3 Chunks of size 720128 totalling 2.06MiB
2020-03-30 05:09:51.536345: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 960000 totalling 937.5KiB
2020-03-30 05:09:51.536771: I tensorflow/core/common_runtime/bfc_allocator.cc:962] Sum Total of in-use chunks: 6.39MiB
2020-03-30 05:09:51.537159: I tensorflow/core/common_runtime/bfc_allocator.cc:964] total_region_allocated_bytes_: 1436473600 memory_limit_: 1436473753 available bytes: 153 curr_region_allocation_bytes_: 2872947712
2020-03-30 05:09:51.537847: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats:
Limit: 1436473753
InUse: 6698752
MaxInUse: 6699264
NumAllocs: 178
MaxAllocSize: 960000
2020-03-30 05:09:51.538711: W tensorflow/core/common_runtime/bfc_allocator.cc:429] *___________________________________________________________________________________________________
2020-03-30 05:09:51.540175: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Dst tensor is not initialized.
[[{{node main_input/_8}}]]
Traceback (most recent call last):

Was it because I'm out of memory in the GPU? If so, please guide me to fix this.

Thanks.

@amirmohammadkz
Copy link

Thanks a lot for your response

Your welcome :). You can make us happy with giving our project a star :D

Was it because I'm out of memory in the GPU? If so, please guide me to fix this.

Try reducing the batch_size, try with 1 and if it worked, increase it.
you can also mention @saminfatehir for more specific questions about the Keras model.

@CyraxSector
Copy link
Author

It worked :) Thanks a lot for helping.
But another issue has been occurred and the console log has been extracted as follows.

Traceback (most recent call last):
File "", line 1, in
File "C:\Program Files\JetBrains\PyCharm 2019.1.3\plugins\python\helpers\pydev_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2019.1.3\plugins\python\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "E:/Projects/personality-detection-Keras/conv_net_train_keras.py", line 316, in
results = train_conv_net(datasets, W, historyfile, i)
File "E:/Projects/personality-detection-Keras/conv_net_train_keras.py", line 176, in train_conv_net
callbacks=[my_logger, history])
File "C:\Python37\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Python37\lib\site-packages\keras\engine\training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "C:\Python37\lib\site-packages\keras\engine\training_generator.py", line 260, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "C:\Python37\lib\site-packages\keras\callbacks\callbacks.py", line 152, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "E:/Projects/personality-detection-Keras/conv_net_train_keras.py", line 52, in on_epoch_end
logging.info("epoch = %4d loss = %0.6f acc = %0.2f%%" % (epoch, curr_loss, curr_acc))
TypeError: must be real number, not NoneType

Was it because of the Keras version? I'm using the latest Keras version on Tensorflow 2.1.0.

@amirmohammadkz
Copy link

Hmm, I think it may be because of the value of (epoch,curr_loss,curr_acc). Are you passing an appropriate dataset containing instances for both classes (0.1 of your training data is your validation dataset) ?
Can you trace in debug mode and specify which attribute is None(before the line of error, there are some evaluation results)?
It also maybe because of the version of keras or TF, but I am not sure at the moment.

@CyraxSector
Copy link
Author

It worked :) I changed the followings
curr_acc = logs.get('acc')
val_acc = logs.get('val_acc')

to
curr_acc = logs.get('accuracy')
val_acc = logs.get('val_accuracy')

As the next step, how should I test this model? As an example, when a sample sentence has been given as input, how this model work using the trained model? Please explain me the way of converting this model to real time context.

Thank you very much.

@amirmohammadkz
Copy link

Actually, it needs some effort. First, you should save the final model instead of testing it on the test data.
Then you should prepare your test set. Note that you should feed the full test dataset to Mairesse and after that, you should run the process-data python file and use its extracted features as the input of the trained model.

@CyraxSector
Copy link
Author

Could you please elaborate more on this? Thanks a lot.

@amirmohammadkz
Copy link

The problem is that this code is designed for testing on the Essays dataset. If you want to feed another data, you have two options:

  1. Change the code so that it can read and process your data
  2. Change your data so that it can be read by the code

In both of these directions, you need to extract mairesse psycholinguistics features. full instruction is available here. Then you need to feed your data into the model. First, you should save the trained model (I mentioned where you should do that). Second, since the code is designed for 10 fold cross-validation, change it in order to handle your test samples and preprocess it without dividing it into 10 folds. Third, you need to feed the preprocessed data (embeddings and extracted mairesse features) into the saved model to evaluate and predict the label.
Hope this helps.

@amirmohammadkz
Copy link

amirmohammadkz commented Apr 14, 2020

@amirmohammadkz I am trying to run your code but I am having this error

File "/Volumes/SAMSUNG/personality-detection-master-amirmohamed/conv_net_train.py", line 508, in
activations=[Sigmoid])

File "/Volumes/SAMSUNG 1/personality-detection-master-amirmohamed/conv_net_train.py", line 143, in train_conv_net
dropout_rates=dropout_rate)

File "", line 106, in init

TypeError: 'zip' object is not subscriptable

any idea how to fix this

Thanks for using my forked repository. Are you using Python 2.7 ? Please use these requirements.

@MarwanMo7amed
Copy link

@amirmohammadkz I have one more question
Screen Shot 2020-04-15 at 7 51 18 PM
the terminal is stuck on this should the process terminate automatically or do I have to do it ?

can you refer me to any resources to understand the code implementation more (courses or videos)

Thanks in advance and sorry to bother you.

@amirmohammadkz
Copy link

Actually I do not remember the exact time required for each step of the code. In the default setting, it requires 50 epochs to complete, and you need to wait for it. You can configure the code as I mentioned in the readme to reduce the training time. Besides, using a GPU will reduce the training time significantly. If you need to understand the implementation, the only thing I can refer to is the Theano website and the paper.

@amirmohammadkz
Copy link

I tried running conv_net_train_keras.py on GPU google colab it kept crashing because of use of RAM, and using CPU i got this error

batch size: #32 (comment)

@amirmohammadkz
Copy link

but I tried running on mac I got this

the first epoch finished but then this error popped up

Have you tested this solution? #32 (comment)

@CyraxSector
Copy link
Author

CyraxSector commented Apr 16, 2020 via email

@Souravcool1996
Copy link

but I tried running on mac I got this
the first epoch finished but then this error popped up

Have you tested this solution? #32 (comment)

I have already trained the model, but after training what should i do? is this project meant only for training or is there is any code for testing the trained model. conv_net_classes.py:- What is the use of this python file.

@amirmohammadkz
Copy link

but I tried running on mac I got this
the first epoch finished but then this error popped up

Have you tested this solution? #32 (comment)

I have already trained the model, but after training what should i do? is this project meant only for training or is there is any code for testing the trained model. conv_net_classes.py:- What is the use of this python file.

It will automatically test the trained model on the dafult test set. you can change the test set if you want.#32 (comment)

@Souravcool1996
Copy link

but I tried running on mac I got this
the first epoch finished but then this error popped up

Have you tested this solution? #32 (comment)

I have already trained the model, but after training what should i do? is this project meant only for training or is there is any code for testing the trained model. conv_net_classes.py:- What is the use of this python file.

It will automatically test the trained model on the dafult test set. you can change the test set if you want.#32 (comment)

then what is the use of conv_net_classes.py

@amirmohammadkz
Copy link

As far as I can remember, it is about the implementation of layers.

@CyraxSector
Copy link
Author

CyraxSector commented Apr 18, 2020 via email

@CyraxSector
Copy link
Author

CyraxSector commented Apr 18, 2020 via email

@amirmohammadkz
Copy link

Now its training smoothly isnt it. If not, yes reduce the epochs as well.

does this message mean the the output is correct ?
"Buffered data was truncated after reaching the output size limit."

also "history_main_model_attr_0_w2v.txt" and "perf_output_main_model_0_w2v.txt" files are empty, the only file have data is the log file.

@CyraxSector and thanks for your response

That's because the file is already open. In logging scenario, the code manages the errors and allows you to monitor the logs at any time, but for the result, result files should be closed at the end of the training in order to be available.

@amirmohammadkz
Copy link

Yes. I meant you need to wait until the code completes its task successfully. after that history file will be closed.

@amirmohammadkz
Copy link

Try this: https://stackoverflow.com/a/55507901/5661543

@CyraxSector
Copy link
Author

@amirmohammadkz I've tested the model successfully by following all your steps. As an example, when a sample sentence has been given as input, it will be evaluated. Thanks a lot for all your guidance. I'll make the PR ASAP.

@Souravcool1996
Copy link

@amirmohammadkz I've tested the model successfully by following all your steps. As an example, when a sample sentence has been given as input, it will be evaluated. Thanks a lot for all your guidance. I'll make the PR ASAP.

How you have tested the model. As described in the comments training and testing will be done by one code.

@amirmohammadkz
Copy link

@amirmohammadkz I've tested the model successfully by following all your steps. As an example, when a sample sentence has been given as input, it will be evaluated. Thanks a lot for all your guidance. I'll make the PR ASAP.

Happy to hear that! Thanks a lot!

@amirmohammadkz
Copy link

Hello,
If you are looking for a faster version of personality detection, our new paper, and its code is available here:
https://github.com/amirmohammadkz/personality_detection

This code is significantly faster and can be run on a regular computer/laptop. The accuracies are higher too.

@addy1997
Copy link

addy1997 commented Feb 9, 2021

@amirmohammadkz

I am getting this error when I was trying to load 'GoogleNews-vectors-negative300-SLIM.bin'. (Code given below)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-b567896a8aa6> in <module>
     14 print('vocab size: ' + str(len(vocab)))
     15 print('max sentence length: ' + str(max_l))
---> 16 w2v = load_bin_vec(wv_from_bin, vocab)
     17 print(w2v)
     18 print('word2vec loaded!')

<ipython-input-20-59822c213c28> in load_bin_vec(fname, vocab)
     49     """
     50     word_vecs = {}
---> 51     with open(fname, 'rb') as f:
     52         header = f.readline()
     53         vocab_size, layer1_size = map(int, header.split())

TypeError: expected str, bytes or os.PathLike object, not Word2VecKeyedVectors

Code is

w2v_file = 'GoogleNews-vectors-negative300-SLIM.bin'
revs, vocab = build_data_train_test(data_train, train_ratio=0.6, clean_string=True)
max_l = np.max(pd.DataFrame(revs)['num_words'])
print('data loaded!')
print('number of sentences: ' + str(len(revs)))
print('vocab size: ' + str(len(vocab)))
print('max sentence length: ' + str(max_l))
w2v = load_bin_vec(w2v_file, vocab)
print(w2v)
print('word2vec loaded!')
print('num words already in word2vec: ' + str(len(w2v)))

add_unknown_words(w2v, vocab)
W, word_idx_map = get_W(w2v)
cPickle.dump([revs, W, word_idx_map, vocab], open('imdb-train-val-testN.pickle', 'wb'))
print('dataset created successfully!')

Any help or guidance is highly appreciated.

@amirmohammadkz
Copy link

@addy1997 Have a look at my forked repo. I think it will be a better starting point.
https://github.com/amirmohammadkz/personality-detection

@addy1997
Copy link

@addy1997 Have a look at my forked repo. I think it will be a better starting point.
https://github.com/amirmohammadkz/personality-detection

Thanks it worked

@rrrccre
Copy link

rrrccre commented Nov 7, 2024

I want to know why I didn't get anything after 50 epochs and instead started over again. Thank you very much for your reply

epoch: 47, training time: 9885.98 secs, train perf: 70.90 %, val perf: 55.60 %
epoch: 48, training time: 8464.05 secs, train perf: 82.95 %, val perf: 56.80 %
epoch: 49, training time: 8464.44 secs, train perf: 62.60 %, val perf: 53.20 %
epoch: 50, training time: 8462.59 secs, train perf: 93.75 %, val perf: 58.40 %
cv: 0, perf: 0.5301724137931034, macro_fscore: 0.5282001529879289
[('image shape', 153, 300), ('filter shape', [(200, 1, 1, 300), (200, 1, 2, 300), (200, 1, 3, 300)]), ('hidden_units', [200, 200, 2]), ('dropout', [0.5, 0.5, 0.5]), ('batch_size', 50), ('non_static', False), ('learn_decay', 0.95), ('conv_non_linear', 'relu'), ('non_static', False), ('sqr_norm_lim', 9), ('shuffle_batch', True)]
... training
epoch: 1, training time: 8432.78 secs, train perf: 52.90 %, val perf: 54.00 %
Test result: accu: 0.5483870967741935, macro_fscore: 0.3541666666666667
tp: 136.0 tn:0.0 fp: 112.0 fn: 0.0
@addy1997 @CyraxSector

@rrrccre
Copy link

rrrccre commented Nov 10, 2024

I found four cPickle files after terminating the training, but I don't know how to use them for text prediction
cPickle.dump(svm_test, open("cvte" + str(attr) + str(cv) + ".p", "wb"))
cPickle.dump(svm_train, open("cvtr" + str(attr) + str(cv) + ".p", "wb"))
cvte20.p,cvte21.p,cvtr20.p,cvtr21.p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants