From 9447f4e1a03fab8bb78e55aac35717efc1d73aeb Mon Sep 17 00:00:00 2001 From: minhuanli Date: Mon, 8 May 2023 23:25:01 -0400 Subject: [PATCH] update pytorch mempry leak blog --- _posts/2021-12-27-Attention1NMT.md | 2 +- _posts/2023-05-08-PytorchMemoryLeak.md | 36 ++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 1 deletion(-) create mode 100644 _posts/2023-05-08-PytorchMemoryLeak.md diff --git a/_posts/2021-12-27-Attention1NMT.md b/_posts/2021-12-27-Attention1NMT.md index 1d2bd42..b284f39 100644 --- a/_posts/2021-12-27-Attention1NMT.md +++ b/_posts/2021-12-27-Attention1NMT.md @@ -1,6 +1,6 @@ --- layout: post -title: Attention I, Early Implementation of Attention Mechanism +title: Early Implementation of Attention Mechanism tags: AI&Physics Attention katex: True progress: 100% diff --git a/_posts/2023-05-08-PytorchMemoryLeak.md b/_posts/2023-05-08-PytorchMemoryLeak.md new file mode 100644 index 0000000..1ee4b6c --- /dev/null +++ b/_posts/2023-05-08-PytorchMemoryLeak.md @@ -0,0 +1,36 @@ +--- +layout: post +title: An obscure reason of GPU memory leak in pytorch +description: A short debug note on why I kept getting CUDA out of memory error in my codes. Main takeaway is, don't use in-place operations in your computing graph unless necessary. If you are applying it to non-leaf tensors, change it even it seems necessary. I tested on both 1.13 and 2.0, with cuda version 11.6 and 11.7. +tag: tech +comments: true +--- +Recently I am transfering some of my prvious tensorflow and jax codes into pytorch. About the comparison between the three frameworks, we could have another 10 blogs to argue, but that is not what I want to share today. + +During the testing of my torch codes, I noticed the allcated cuda memory kept increasing as the training loop went. And apparently I didn't make any obvious mistakes like appending my loss term to the log before itemizing it. + +So driven by my curiosity and perfectionism, I decided to debug my codes line by line, and finally find this largely unnoticed issue: + +If `x` is a non-leaf tensor, e.g. `x` is the output of a linear layer, in-place operations like + +```python +x /= torch.norm(x, dim=-1, keepdim=True) +``` + +will cause the memory leak issue and keep increasing the memory every time you call this line. And changing it to + +```python +x = x / torch.norm(x, dim=-1, keepdim=True) +``` + +will totally solve the issue. + +The above issue is super easy to reproduce in both `1.13` and `2.0`, as shown in the following picture: + +
+
+Figure1 +
+
+ +So, the takeaway is: avoid in-place operations in your pytorch computing graph.