Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nn.GPU #835

Merged
merged 3 commits into from
Jul 2, 2016
Merged

nn.GPU #835

merged 3 commits into from
Jul 2, 2016

Conversation

nicholas-leonard
Copy link
Member

@nicholas-leonard nicholas-leonard commented May 27, 2016

This PR adds nn.GPU which can be used to distribute modules across multiple devices.

OMG why is it in nn and not in cunn? As discussed with @soumith, putting in nn means that it can be used in CPU-only environments, which is common for production.

The unit tests are located in cunn to avoid a pcall(function() require cunn end) in the nn unit tests (see PR torch/cunn#282).

:add(nn.GPU(nn.Linear(10000,10000), 1))
:add(nn.GPU(nn.Linear(10000,10000), 2))
:add(nn.GPU(nn.Linear(10000,10000), 3))
:add(nn.GPU(nn.Linear(10000,10000), 4, cutorch.getDevice()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this line can run in CPU-only environments

Copy link
Member Author

@nicholas-leonard nicholas-leonard May 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cutorch.getDevice will not. But it isn't mandatory to use cutorch.getDevice(), this is just an example.

@nicholas-leonard nicholas-leonard force-pushed the GPU branch 2 times, most recently from b1d68d6 to dba06e5 Compare June 3, 2016 21:04
@nicholas-leonard
Copy link
Member Author

I have tested the nn.GPU implementation on a language model task using 4 GPUs and it works.

@iamalbert
Copy link
Contributor

This is cool but the name and the order of parameters is .. a little confusing

:add(nn.OnGPU(1, nn.Linear(10000,10000))
:add(nn.OnGPU(2, nn.Linear(10000,10000))
:add(nn.OnGPU(3, nn.Linear(10000,10000))
:add(nn.OnGPU(4, cutorch.getDevice(), nn.Linear(10000,10000)))

may be clearer than

:add(nn.GPU(nn.Linear(10000,10000), 1))
:add(nn.GPU(nn.Linear(10000,10000), 2))
:add(nn.GPU(nn.Linear(10000,10000), 3))
:add(nn.GPU(nn.Linear(10000,10000), 4, cutorch.getDevice()))

how do you think?

@nicholas-leonard
Copy link
Member Author

nicholas-leonard commented Jun 6, 2016

@iamalbert I prefer nn.GPU, but I like your order. Except for the optional outdevice argument which I think should still be last (as it is optional):

:add(nn.GPU(1, nn.Linear(10000,10000))
:add(nn.GPU(2, nn.Linear(10000,10000))
:add(nn.GPU(3, nn.Linear(10000,10000))
:add(nn.GPU(4, nn.Linear(10000,10000)), cutorch.getDevice())

@szagoruyko
Copy link
Member

The order is module, device in ModelParallel already, it would be confusing if we introduce another order

@nicholas-leonard
Copy link
Member Author

@szagoruyko Good point. The argument of backwards compatibility always wins (plus I am lazy). I will leave it as it is.

@szagoruyko
Copy link
Member

@nicholas-leonard there is a couple of places like self.output = output that break tensor sharing and optnet, can we avoid it?

@nicholas-leonard
Copy link
Member Author

@szagoruyko how does it break tensor sharing (I don't know anyone that shares outputs). As for optnet, what does it do that cannot support self.output = output? because I am pretty sure I do this in many other places.

@fmassa
Copy link
Contributor

fmassa commented Jun 15, 2016

Hi,

Sorry for the delay in replying.

The current underlying requirement in optnet is that the tensor/storage objects corresponding to the output and gradInput of each module doesn't change across runs of forward/backward.
There are two cases actually:

  • when one wants to create graph visualizations, the tensor shouldn't change.
    Thus doing something like the following will currently not work (although there is a pending PR which tries to fix it, but I'm not 100% sure it doesn't have bad side effects):
function MySelectModule:updateOutput(input)
  self.output = input[1] -- creates a new tensor which shares the storage
  return self.output
end
  • when we care only about optimizing for memory, optnet only looks for the storages. So a module which allocates new memory for the output/gradInput during each forward/backward won't be able to reuse buffers, but I think that it shouldn't cause any problems wrt the correctness of the entire network forward/backward.

About this PR, as it's a generic module that supports both tensors and tables, using set is not an option. But from a quick look I have the impression that this module could potentially work as is with optnet, but I haven't tried it to check.

@nicholas-leonard
Copy link
Member Author

So then I guess this PR is ready to merge then :)

@nicholas-leonard
Copy link
Member Author

self.modules[1] = module

if module:type() == 'torch.CudaTensor' then
self:cuda()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this :cuda() is no executing in the context of "device". needs a fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith soumith merged commit 07d3bdd into torch:master Jul 2, 2016
@soumith
Copy link
Member

soumith commented Jul 2, 2016

Thanks for your patience Nicholas!

@szagoruyko
Copy link
Member

just realized, we need to add support for other cuda datatypes than float

@soumith
Copy link
Member

soumith commented Jul 2, 2016

he seems to be checking for Cuda*Tensor, is that not sufficient?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants