You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have issue trying to train an INN on multiple GPUs using nn.DataParallel().
I recreated a minimal example of the issue from the example at the beginning of the ReadMe. When initializing the inn using Ff.SequenceINN() (commented in the code below), all goes as intended and I can see all of my GPUs being loaded. But when using Ff.ReversibleGraphNet(), I get an error saying that there is a mismatch between several tensors' devices (note: this initialization works fine on a single GPU).
Since I am planning to implement a convolutional conditional INN I will need something more expressive than just a SequenceINN() that supports splitting.
Does anyone had success training such model on multiple GPUs?
Thanks.
import torch
import torch.nn as nn
from sklearn.datasets import make_moons
import FrEIA.framework as Ff
import FrEIA.modules as Fm
from tqdm import trange
BATCHSIZE = 1000
N_DIM = 2
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def subnet_fc(dims_in, dims_out):
return nn.Sequential(nn.Linear(dims_in, 512), nn.ReLU(),
nn.Linear(512, dims_out))
# inn = Ff.SequenceINN(N_DIM)
# for k in range(8):
# inn.append(Fm.AllInOneBlock, subnet_constructor=subnet_fc, permute_soft=True)
nodes = []
nodes.append(Ff.InputNode(2))
for k in range(8):
nodes.append(Ff.Node(nodes[-1],
Fm.AllInOneBlock,
{'subnet_constructor':subnet_fc, 'permute_soft':True}))
nodes.append(Ff.OutputNode(nodes[-1]))
inn = Ff.ReversibleGraphNet(nodes)
inn = nn.DataParallel(inn)
inn = inn.to(device)
optimizer = torch.optim.Adam(inn.parameters(), lr=0.001)
for i in trange(1000):
optimizer.zero_grad()
data, label = make_moons(n_samples=BATCHSIZE, noise=0.05)
x = torch.Tensor(data).to(device)
z, log_jac_det = inn(x)
loss = 0.5*torch.sum(z**2, 1) - log_jac_det
loss = loss.mean() / N_DIM
loss.backward()
optimizer.step()
z = torch.randn(100, N_DIM).to(device)
samples, _ = inn(z, rev=True)
The text was updated successfully, but these errors were encountered:
Hello,
I have issue trying to train an INN on multiple GPUs using nn.DataParallel().
I recreated a minimal example of the issue from the example at the beginning of the ReadMe. When initializing the inn using Ff.SequenceINN() (commented in the code below), all goes as intended and I can see all of my GPUs being loaded. But when using Ff.ReversibleGraphNet(), I get an error saying that there is a mismatch between several tensors' devices (note: this initialization works fine on a single GPU).
Since I am planning to implement a convolutional conditional INN I will need something more expressive than just a SequenceINN() that supports splitting.
Does anyone had success training such model on multiple GPUs?
Thanks.
The text was updated successfully, but these errors were encountered: