Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FCModel.py working on CUDA #2

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pbelevich
Copy link

I tried to run python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data(as said in the book on page 34) with different versions of PyTorch CUDA-enabled builds but all my attempts failed with the following errors. But if I use CPU-only PyTorch builds, then everything works as expected

(dl-with-pt) C:\Users\pavel\dev\ImageCaptioning.pytorch>python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data
DataLoaderRaw loading images from folder:  ./data
0
listing all images in directory ./data
DataLoaderRaw found  1  images
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py:149: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  logprobs = F.log_softmax(self.logit(output))
Traceback (most recent call last):
  File "eval.py", line 132, in <module>
    loss, split_predictions, lang_stats = eval_utils.eval_split(
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\eval_utils.py", line 106, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 162, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 144, in sample_beam
    xt = self.embed(Variable(it, requires_grad=False))
  File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
    return F.embedding(
  File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py", line 1814, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

or

(dl-with-pt) C:\Users\pavel\dev\ImageCaptioning.pytorch>python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data
DataLoaderRaw loading images from folder:  ./data
0
listing all images in directory ./data
DataLoaderRaw found  1  images
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py:149: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  logprobs = F.log_softmax(self.logit(output))
Traceback (most recent call last):
  File "eval.py", line 132, in <module>
    loss, split_predictions, lang_stats = eval_utils.eval_split(
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\eval_utils.py", line 106, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 162, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 143, in sample_beam
    it = fc_feats.data.new(beam_size).long().zero_()
RuntimeError: CUDA error: unknown error

I debugged the code and looks like FCModel layers were forgotten to be moved to the device.

@ephraim71
Copy link

Thank you :) I'm on the same book

@waszee
Copy link

waszee commented Oct 2, 2020

I have just started working through the book too. So far, struggling in chapter 4 with hot encoding concepts , I have really enjoyed the book so far and the notebook scripts have worked. My system doesn't have a CUDA graphics card and I am sure I am going to want one soon. I have started to search for one and wonder what others like you are using or recommend. The new 3000 series cards are out of my budget but the GTX1660 looks good to me. Should I avoid?

@jhagege
Copy link

jhagege commented Oct 3, 2020

Fixed the issue on my end. Thanks.

@pbelevich
Copy link
Author

@waszee just use https://colab.research.google.com/ and select GPU in Runtime->Change runtime type

@waszee
Copy link

waszee commented Oct 6, 2020

Thanks for suggestion and will check out. I am in chapter 5 now. At the end of chapter 4 was some stuff on audio files and tried to write some tensor creations for morse code that I captured from my radio. I am struggling with embedding concepts to handle patterns of different sizes. Makes a good brain teaser for an old guy :).

@pbelevich
Copy link
Author

@elistevens would you mind to take a look at this pr?

@waszee
Copy link

waszee commented Oct 6, 2020

fyi I posted a separate query about audio files and DL tensors. I think it is number 9. I am still learning how to jump around and reference stuff already posted.

@kaiser-hamid-rabbi
Copy link

Fixed the issue on my end. Thanks.

Could you please tell me, how you solve that while running the code on GPU?

@jhagege
Copy link

jhagege commented Jan 13, 2022

Fixed the issue on my end. Thanks.

Could you please tell me, how you solve that while running the code on GPU?

Sorry for not remembering.
I’ll try and document solution in next occurrences.
Good luck!

@jsgoller1
Copy link

In my case, I ran into a slightly different error when running the same command:

─ $ ▶ python3 eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_fold ./data/
DataLoaderRaw loading images from folder:  ./data/                                                                                                                                                                                  0             
listing all images in directory ./data/                                                                           
DataLoaderRaw found  1  images                                                                                                                                                                                                      
/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/functional.py:780: UserWarning: Note that order of the arguments: ceil_mode and return_indices will changeto match the args list in nn.MaxPool2d in a f
uture release.                                                                                                                                                                                                                      
  warnings.warn("Note that order of the arguments: ceil_mode and return_indices will change"
Traceback (most recent call last):                                                                                                                                                                                                  
  File "/home/joshua/Code/ImageCaptioning.pytorch/eval.py", line 132, in <module>
    loss, split_predictions, lang_stats = eval_utils.eval_split(                                                                                                                                                                    
  File "/home/joshua/Code/ImageCaptioning.pytorch/eval_utils.py", line 106, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "/home/joshua/Code/ImageCaptioning.pytorch/models/FCModel.py", line 160, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "/home/joshua/Code/ImageCaptioning.pytorch/models/FCModel.py", line 141, in sample_beam
    xt = self.img_embed(fc_feats[k:k+1]).expand(beam_size, self.input_encoding_size)
  File "/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)      
  File "/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)

Rather than force the model to use CPU, I was able to make the following ugly hack (after which it worked correctly); @pbelevich this is similar to your changes:

└─ $ ▶ git diff models/FCModel.py
diff --git a/models/FCModel.py b/models/FCModel.py
index c275b5b..8885e52 100644
--- a/models/FCModel.py
+++ b/models/FCModel.py
@@ -20,8 +20,8 @@ class LSTMCore(nn.Module):
         self.drop_prob_lm = opt.drop_prob_lm
         
         # Build a LSTM
-        self.i2h = nn.Linear(self.input_encoding_size, 5 * self.rnn_size)
-        self.h2h = nn.Linear(self.rnn_size, 5 * self.rnn_size)
+        self.i2h = nn.Linear(self.input_encoding_size, 5 * self.rnn_size, device='cuda:0')
+        self.h2h = nn.Linear(self.rnn_size, 5 * self.rnn_size, device='cuda:0')
         self.dropout = nn.Dropout(self.drop_prob_lm)
 
     def forward(self, xt, state):
@@ -59,10 +59,10 @@ class FCModel(CaptionModel):
 
         self.ss_prob = 0.0 # Schedule sampling probability
 
-        self.img_embed = nn.Linear(self.fc_feat_size, self.input_encoding_size)
+        self.img_embed = nn.Linear(self.fc_feat_size, self.input_encoding_size, device='cuda:0')
         self.core = LSTMCore(opt)
-        self.embed = nn.Embedding(self.vocab_size + 1, self.input_encoding_size)
-        self.logit = nn.Linear(self.rnn_size, self.vocab_size + 1)
+        self.embed = nn.Embedding(self.vocab_size + 1, self.input_encoding_size, device='cuda:0')
+        self.logit = nn.Linear(self.rnn_size, self.vocab_size + 1, device='cuda:0')

The script then produces the correct result:

└─ $ ▶ python3 eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_fold ./data/
DataLoaderRaw loading images from folder:  ./data/
0
listing all images in directory ./data/
DataLoaderRaw found  1  images
...
image 1: a person riding a horse on a dirt road
evaluating validation preformance... -1/1 (0.000000)
loss:  0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants