Skip to content

Commit

Permalink
fix typo
Browse files Browse the repository at this point in the history
  • Loading branch information
minhuanli committed May 9, 2023
1 parent 9447f4e commit fcd985d
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions _posts/2023-05-08-PytorchMemoryLeak.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
layout: post
title: An obscure reason of GPU memory leak in pytorch
description: A short debug note on why I kept getting CUDA out of memory error in my codes. Main takeaway is, don't use in-place operations in your computing graph unless necessary. If you are applying it to non-leaf tensors, change it even it seems necessary. I tested on both 1.13 and 2.0, with cuda version 11.6 and 11.7.
description: A short debug note on why I kept getting "CUDA out of memory" error in my codes. Main takeaway is, don't use in-place operations in your computing graph unless necessary. If you are applying it to non-leaf tensors, change it even it seems necessary. I tested on both 1.13 and 2.0, with cuda version 11.6 and 11.7.
tag: tech
comments: true
---
Recently I am transfering some of my prvious tensorflow and jax codes into pytorch. About the comparison between the three frameworks, we could have another 10 blogs to argue, but that is not what I want to share today.

During the testing of my torch codes, I noticed the allcated cuda memory kept increasing as the training loop went. And apparently I didn't make any obvious mistakes like appending my loss term to the log before itemizing it.
During the testing of my torch codes, I noticed the allocated cuda memory kept increasing as the training loop went. And apparently I didn't make any obvious mistakes like appending my loss term to the log before itemizing it.

So driven by my curiosity and perfectionism, I decided to debug my codes line by line, and finally find this largely unnoticed issue:

Expand All @@ -17,7 +17,11 @@ If `x` is a non-leaf tensor, e.g. `x` is the output of a linear layer, in-place
x /= torch.norm(x, dim=-1, keepdim=True)
```

will cause the memory leak issue and keep increasing the memory every time you call this line. And changing it to
will cause the memory leak issue and keep increasing the memory every time you call this line.

### <i class='contrast'>How to solve?</i>

Changing the line to

```python
x = x / torch.norm(x, dim=-1, keepdim=True)
Expand Down

0 comments on commit fcd985d

Please sign in to comment.