Bad mix with apply in DataParallelTable ? #262

fmassa · 2016-04-28T19:05:55Z

There seems to be a problem (or at least a behaviour which is not compatible with other modules) with apply in DataParallelTable.
The following snippet reproduces the problem:

require 'cunn'
model = nn.DataParallelTable(1)
model:add(nn.SpatialConvolution(3,96,7,7,3,3),1)
model:add(nn.SpatialConvolution(3,96,7,7,3,3),2)
model:cuda()
input = torch.randn(32,3,224,224):cuda()
function f(m)
  local ff = m.updateOutput
  m.updateOutput = function(self, i)
    return ff(self, i)
  end
end
model:apply(f)
model:forward(input);

The error is the following:

/opt/rocks/distro/install/share/lua/5.1/torch/File.lua:141: Unwritable object <function> at <?>.<?>.updateOutput.ff.THNN.getState.ffi.abi
stack traceback:
    [C]: in function 'error'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:141: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    ...
    /opt/rocks/distro/install/share/lua/5.1/nn/Module.lua:107: in function 'clone'
    .../distro/install/share/lua/5.1/cunn/DataParallelTable.lua:634: in function 'applyChanges'
    .../distro/install/share/lua/5.1/cunn/DataParallelTable.lua:644: in function 'setup'
    .../distro/install/share/lua/5.1/cunn/DataParallelTable.lua:651: in function 'exec'
    .../distro/install/share/lua/5.1/cunn/DataParallelTable.lua:194: in function 'forward'
    [string "model:forward(input);"]:1: in main chunk
    [C]: in function 'xpcall'
    /opt/rocks/distro/install/share/lua/5.1/trepl/init.lua:669: in function 'repl'
    ...cks/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
    [C]: at 0x00406670

Here is a related issue fmassa/optimize-net#12

Is it an abusive use of apply (and should not be supported) ?

The text was updated successfully, but these errors were encountered:

apaszke · 2016-04-28T19:28:48Z

Can't reproduce here. The error looks as if you were passing that function to a thread, but you aren't using the threaded Impl in your example.

EDIT: it works for me even with threads enabled

fmassa · 2016-04-28T19:33:59Z

Ok , thanks for trying it out. I'll try this snippet on another machine to see if I manage to reproduce it.

szagoruyko · 2016-04-28T19:36:31Z

@apaszke are you using lua5.2? I can repro on several machines with luajit2.1 and freshly installed torch

apaszke · 2016-04-28T19:50:13Z

@szagoruyko It was LuaJIT. I reinstalled everything and I can confirm the issue now.

apaszke · 2016-04-28T20:14:48Z

I got it to work, but it's a bit of a hack 😜

require 'cunn'                                                      
model = nn.DataParallelTable(1)                                     
model:add(nn.SpatialConvolution(3,96,7,7,3,3),1)                    
model:add(nn.SpatialConvolution(3,96,7,7,3,3),2)                    
model:cuda()                                                        
input = torch.randn(32,3,224,224):cuda()                            
function f(m)                                                       
  m.updateOutput = function(self, i)                                
     return torch.getmetatable(torch.type(m)).updateOutput(self, i) 
  end                                                               
end                                                                 
model:apply(f)                                                      
print(model:forward(input):size())

I don't really know why it tries to serialise THNN along with the closure in your example. Also it seems to work with older Torch versions, which is even weirder.

apaszke · 2016-04-28T20:20:31Z

Still, this is a problem with serialization, not with DPT. Below is a snippet that reproduces it:

require 'nn'                                                                                   
model = nn.Sequential()                          
model:add(nn.SpatialConvolution(3,96,7,7,3,3),1) 
function f(m)                                    
  m._uO = m.updateOutput                         
  m.updateOutput = function(self, i)             
     return m:_uO(i)                             
  end                                            
end                                              
model:apply(f)                                   
model:clone()

colesbury · 2016-04-28T20:22:47Z

It serializes THNN because the reference chain:

m.updateOutput -> ff -> THNN.optionalTensor

i.e.

the original updateOutput is now an upvalue (ff) of the new updateOutput
function
the original updateOutput has THNN as an upvalue because of the call to
THNN.updateOutput

On Thu, Apr 28, 2016 at 4:14 PM, Adam Paszke [email protected]
wrote:

I got it to work, but it's a bit of a hack 😜

require 'cunn'
model = nn.DataParallelTable(1)
model:add(nn.SpatialConvolution(3,96,7,7,3,3),1)
model:add(nn.SpatialConvolution(3,96,7,7,3,3),2)
model:cuda()
input = torch.randn(32,3,224,224):cuda() function f(m)
m.updateOutput = function(self, i)
return torch.getmetatable(torch.type(m)).updateOutput(self, i)
end end
model:apply(f) print(model:forward(input):size())

I don't really know why it tries to serialise THNN along with the closure
in your example. Also it seems to work with older Torch versions, which is
even weirder.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#262 (comment)

apaszke · 2016-04-28T20:28:27Z

@colesbury But, if you don't do :clone(), then THNN is still an upvalue of the original updateOutput. It doesn't get serialized then, becase it's a metatable method, right?

apaszke · 2016-04-28T20:30:53Z

If so, this is probably a better workaround:

require 'nn'                                     
model = nn.Sequential()                          
model:add(nn.SpatialConvolution(3,96,7,7,3,3),1) 
function f(m)                                    
  local mt = getmetatable(m)                     
  mt._updateOutput = mt.updateOutput             
  m.updateOutput = function(self, i)             
     return self:_updateOutput(i)                
  end                                            
end                                              
model:apply(f)                                   
model:clone()

The only downside is that it will break if you save the model to a file and load it without _updateOutput set in all modules' metatables. You could override the write method as well, but it starts to be a chain of hacks 😕

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad mix with apply in DataParallelTable ? #262

Bad mix with apply in DataParallelTable ? #262

fmassa commented Apr 28, 2016

apaszke commented Apr 28, 2016 •

edited

Loading

fmassa commented Apr 28, 2016

szagoruyko commented Apr 28, 2016

apaszke commented Apr 28, 2016

apaszke commented Apr 28, 2016

apaszke commented Apr 28, 2016

colesbury commented Apr 28, 2016

apaszke commented Apr 28, 2016

apaszke commented Apr 28, 2016 •

edited

Loading

Bad mix with apply in DataParallelTable ? #262

Bad mix with apply in DataParallelTable ? #262

Comments

fmassa commented Apr 28, 2016

apaszke commented Apr 28, 2016 • edited Loading

fmassa commented Apr 28, 2016

szagoruyko commented Apr 28, 2016

apaszke commented Apr 28, 2016

apaszke commented Apr 28, 2016

apaszke commented Apr 28, 2016

colesbury commented Apr 28, 2016

apaszke commented Apr 28, 2016

apaszke commented Apr 28, 2016 • edited Loading

apaszke commented Apr 28, 2016 •

edited

Loading

apaszke commented Apr 28, 2016 •

edited

Loading