CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 15.90 GiB total capacity; 14.69 GiB already allocated; 291.75 MiB free; 14.76 GiB reserved in total by PyTorch) #18

abrh119 · 2022-04-16T11:35:53Z

any fixes for this?

Alegzandra · 2022-12-07T13:40:07Z

Try reducing the batch size.

Siddharth-Latthe-07 · 2024-07-20T14:05:16Z

here are some possible solution for the above issue:-

Reduce Batch Size:
The most straightforward way is to reduce the batch size. This decreases the amount of memory needed for each forward and backward pass.
sample snippet:-

batch_size = 16  # Try reducing this value
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

clear cache
Use Mixed Precision Training:
Mixed precision training can significantly reduce memory usage by using half-precision (float16) instead of full-precision (float32).
Sample snippet:-

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        with autocast():
            outputs = model(inputs.float())
            loss = criterion(outputs, labels.long())
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

Gradient Accumulation:
Accumulate gradients over several mini-batches to simulate a larger batch size

accumulation_steps = 4
for epoch in range(num_epochs):
    optimizer.zero_grad()
    for i, (inputs, labels) in enumerate(train_loader):
        outputs = model(inputs.float())
        loss = criterion(outputs, labels.long())
        loss = loss / accumulation_steps
        loss.backward()
        if (i + 1) % accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()

Optimize Data Loading:
Ensure that data loading is efficient and not causing memory overheads.

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True)

Example with Reduced Batch Size and Mixed Precision
Here is how you can combine reducing batch size and using mixed precision training:

import torch
from torch.cuda.amp import autocast, GradScaler
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader

# Assuming you have your dataset and model defined
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)  # Reduced batch size
model = CNN_LSTM(input_dim, hidden_dim, num_classes).cuda()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
scaler = GradScaler()

for epoch in range(num_epochs):
    model.train()
    for inputs, labels in train_loader:
        inputs, labels = inputs.cuda(), labels.cuda()  # Ensure data is on GPU
        optimizer.zero_grad()
        with autocast():
            outputs = model(inputs.float())
            loss = criterion(outputs, labels.long())
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

    # Evaluate on validation set if needed
    # Calculate accuracy, etc.

# Example evaluation loop
model.eval()
total_correct = 0
total_samples = 0

for inputs, labels in test_loader:
    inputs, labels = inputs.cuda(), labels.cuda()
    with torch.no_grad():
        outputs = model(inputs.float())
        _, predicted = torch.max(outputs, 1)
        total_correct += (predicted == labels).sum().item()
        total_samples += labels.size(0)

accuracy = total_correct / total_samples
print(f'Accuracy on test set: {accuracy}')

Hope, this helps
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 15.90 GiB total capacity; 14.69 GiB already allocated; 291.75 MiB free; 14.76 GiB reserved in total by PyTorch) #18

CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 15.90 GiB total capacity; 14.69 GiB already allocated; 291.75 MiB free; 14.76 GiB reserved in total by PyTorch) #18

abrh119 commented Apr 16, 2022

Alegzandra commented Dec 7, 2022

Siddharth-Latthe-07 commented Jul 20, 2024

CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 15.90 GiB total capacity; 14.69 GiB already allocated; 291.75 MiB free; 14.76 GiB reserved in total by PyTorch) #18

CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 15.90 GiB total capacity; 14.69 GiB already allocated; 291.75 MiB free; 14.76 GiB reserved in total by PyTorch) #18

Comments

abrh119 commented Apr 16, 2022

Alegzandra commented Dec 7, 2022

Siddharth-Latthe-07 commented Jul 20, 2024