Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 15.90 GiB total capacity; 14.69 GiB already allocated; 291.75 MiB free; 14.76 GiB reserved in total by PyTorch) #18

Open
abrh119 opened this issue Apr 16, 2022 · 2 comments

Comments

@abrh119
Copy link

abrh119 commented Apr 16, 2022

any fixes for this?

@Alegzandra
Copy link

Try reducing the batch size.

@Siddharth-Latthe-07
Copy link

here are some possible solution for the above issue:-

  1. Reduce Batch Size:
    The most straightforward way is to reduce the batch size. This decreases the amount of memory needed for each forward and backward pass.
    sample snippet:-
batch_size = 16  # Try reducing this value
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
  1. clear cache
  2. Use Mixed Precision Training:
    Mixed precision training can significantly reduce memory usage by using half-precision (float16) instead of full-precision (float32).
    Sample snippet:-
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        with autocast():
            outputs = model(inputs.float())
            loss = criterion(outputs, labels.long())
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
  1. Gradient Accumulation:
    Accumulate gradients over several mini-batches to simulate a larger batch size
accumulation_steps = 4
for epoch in range(num_epochs):
    optimizer.zero_grad()
    for i, (inputs, labels) in enumerate(train_loader):
        outputs = model(inputs.float())
        loss = criterion(outputs, labels.long())
        loss = loss / accumulation_steps
        loss.backward()
        if (i + 1) % accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()
  1. Optimize Data Loading:
    Ensure that data loading is efficient and not causing memory overheads.
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True)
  1. Example with Reduced Batch Size and Mixed Precision
    Here is how you can combine reducing batch size and using mixed precision training:
import torch
from torch.cuda.amp import autocast, GradScaler
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader

# Assuming you have your dataset and model defined
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)  # Reduced batch size
model = CNN_LSTM(input_dim, hidden_dim, num_classes).cuda()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
scaler = GradScaler()

for epoch in range(num_epochs):
    model.train()
    for inputs, labels in train_loader:
        inputs, labels = inputs.cuda(), labels.cuda()  # Ensure data is on GPU
        optimizer.zero_grad()
        with autocast():
            outputs = model(inputs.float())
            loss = criterion(outputs, labels.long())
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

    # Evaluate on validation set if needed
    # Calculate accuracy, etc.

# Example evaluation loop
model.eval()
total_correct = 0
total_samples = 0

for inputs, labels in test_loader:
    inputs, labels = inputs.cuda(), labels.cuda()
    with torch.no_grad():
        outputs = model(inputs.float())
        _, predicted = torch.max(outputs, 1)
        total_correct += (predicted == labels).sum().item()
        total_samples += labels.size(0)

accuracy = total_correct / total_samples
print(f'Accuracy on test set: {accuracy}')

Hope, this helps
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants