Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak? #488

Closed
saul-jb opened this issue Apr 8, 2024 · 3 comments
Closed

Possible memory leak? #488

saul-jb opened this issue Apr 8, 2024 · 3 comments
Labels
need/author-input Needs input from the original author

Comments

@saul-jb
Copy link
Contributor

saul-jb commented Apr 8, 2024

I have found that there seems to be far more memory allocation in Helia than what is expected when persistent stores are used, the allocation seems to be about equal with the import size so I think it is not freeing up buffer memory somewhere.

I have found 2 cases where this happens - importing local data into Helia & fetching remote data into Helia. The following examples show this first case.

This allocates excess memory:

const helia = await createHelia({
  blockstore: new FsBlockstore('...'),
  datastore: new FsDatastore('...')
})
const ufs = unixfs(helia)

await all(ufs.addAll(globSource('...', '*')))

This does not allocate excess memory:

const ufs = unixfs({ blockstore: new FsBlockstore('...') })

await all(ufs.addAll(globSource('...', '*')))

Is this a mistake on my part, an expected result or a memory leak?

To make it easy to replicate I have made a repo that illustrates these examples (at least on my machine): https://github.com/saul-jb/helia-memory-leak

I have tested it with both node v18.18.2 & v20.7.0.

As far as I can tell from my own debugging so far the problem lies somewhere in Helia's BlockStorage.

@SgtPooki SgtPooki added the help wanted Seeking public contribution on this issue label Aug 8, 2024
@SgtPooki
Copy link
Member

SgtPooki commented Aug 8, 2024

Thanks for submitting this issue @saul-jb . I will take a look here today.

@SgtPooki
Copy link
Member

SgtPooki commented Aug 8, 2024

Hey Saul,

I looked into this a bit and can't seem to find any memory leaks, but one thing that may be unexpected for this particular issue is that Helia starts up libp2p, and that's running in the background, creating dial processes and other things that will show more memory use than just using unixfs directly.

I believe this increase in memory usage is coming from libp2p's processes. We could start up a separate libp2p node just to see what the baseline for libp2p consumption is to confirm further, but I'm pretty confident there's no issue and it's just a misunderstanding of the processes happening in the background.

Some investigation below

Investigating not starting helia & libp2p:

With the below, all I do is set start: false in the helia options

╰─ ✔ ❯ node --expose-gc dist/src/import.js
garbage collection is enabled
memory usage (baseline): {
  rss: 170459136,
  heapTotal: 57229312,
  heapUsed: 19491576,
  external: 29625187,
  arrayBuffers: 569255
}
memory usage (after import): {
  rss: 264224768,
  heapTotal: 92880896,
  heapUsed: 43740680,
  external: 13133570,
  arrayBuffers: 4198156
}
memory usage (after waiting): {
  rss: 260915200,
  heapTotal: 89473024,
  heapUsed: 40324376,
  external: 2788880,
  arrayBuffers: 202258
}

╰─ ✘ INT ❯ node dist/src/import.js
garbage collection is disabled
memory usage (baseline): {
  rss: 151683072,
  heapTotal: 61423616,
  heapUsed: 22632904,
  external: 9751374,
  arrayBuffers: 7175485
}
memory usage (after import): {
  rss: 242008064,
  heapTotal: 88948736,
  heapUsed: 49396536,
  external: 18110285,
  arrayBuffers: 15534372
}
memory usage (after waiting): {
  rss: 261799936,
  heapTotal: 91045888,
  heapUsed: 57275568,
  external: 2780752,
  arrayBuffers: 204370
}

diff

diff --git a/src/import.ts b/src/import.ts
index b314212..d9081ee 100644
--- a/src/import.ts
+++ b/src/import.ts
@@ -33,23 +33,36 @@ await createFile(filePath, 10 ** 9)

 const helia = await createHelia({
   blockstore: new FsBlockstore(Path.join(testDir, 'helia-blockstore')),
-  datastore: new FsDatastore(Path.join(testDir, 'helia-datastore'))
+  datastore: new FsDatastore(Path.join(testDir, 'helia-datastore')),
+  start: false
 })

 const ufs = unixfs(helia)

+if (global.gc != null) {
+  console.log('garbage collection is enabled')
+} else {
+  console.log('garbage collection is disabled')
+}
+
+global.gc?.()
+
 console.log('memory usage (baseline):', process.memoryUsage())

 await all(ufs.addAll(globSource(testDir, '*'), { chunker: fixedSize() }))

+global.gc?.()
+
 console.log('memory usage (after import):', process.memoryUsage())

 await new Promise(resolve => setTimeout(resolve, 30000))

+global.gc?.()
+
 console.log('memory usage (after waiting):', process.memoryUsage())

 await helia.stop()

 await new Promise(resolve => setTimeout(resolve, 2000))

-await fs.rm(testDir, { recursive: true })
+await fs.rm(testDir, { recursive: true, force: true })

use @helia/http instead of helia

╰─ ✔ ❯ node dist/src/import.js                                                                                 9.24   30.1G   66%   100%  ─╯
garbage collection is disabled
memory usage (baseline): {
  rss: 136757248,
  heapTotal: 32292864,
  heapUsed: 14724056,
  external: 28700210,
  arrayBuffers: 27572982
}
memory usage (after import): {
  rss: 203046912,
  heapTotal: 50954240,
  heapUsed: 13058656,
  external: 1895861,
  arrayBuffers: 768593
}
memory usage (after waiting): {
  rss: 183255040,
  heapTotal: 14254080,
  heapUsed: 11759136,
  external: 1155752,
  arrayBuffers: 28484
}

diff for this output

diff --git a/src/import.ts b/src/import.ts
index b314212..0cabb8f 100644
--- a/src/import.ts
+++ b/src/import.ts
@@ -7,7 +7,7 @@ import { fileURLToPath } from 'url'
 import { unixfs, globSource } from '@helia/unixfs'
 import { FsBlockstore } from 'blockstore-fs'
 import { FsDatastore } from 'datastore-fs'
-import { createHelia } from 'helia'
+import { createHeliaHTTP } from '@helia/http'
 import { fixedSize } from 'ipfs-unixfs-importer/chunker'
 import all from 'it-all'
 import createFile from './generate-file.js'
@@ -31,25 +31,37 @@ const filePath = Path.join(testDir, 'file.data')

 await createFile(filePath, 10 ** 9)

-const helia = await createHelia({
+const helia = await createHeliaHTTP({
   blockstore: new FsBlockstore(Path.join(testDir, 'helia-blockstore')),
-  datastore: new FsDatastore(Path.join(testDir, 'helia-datastore'))
+  datastore: new FsDatastore(Path.join(testDir, 'helia-datastore')),
 })

 const ufs = unixfs(helia)

+if (global.gc != null) {
+  console.log('garbage collection is enabled')
+} else {
+  console.log('garbage collection is disabled')
+}
+
+global.gc?.()
+
 console.log('memory usage (baseline):', process.memoryUsage())

 await all(ufs.addAll(globSource(testDir, '*'), { chunker: fixedSize() }))

+global.gc?.()
+
 console.log('memory usage (after import):', process.memoryUsage())

 await new Promise(resolve => setTimeout(resolve, 30000))

+global.gc?.()
+
 console.log('memory usage (after waiting):', process.memoryUsage())

 await helia.stop()

 await new Promise(resolve => setTimeout(resolve, 2000))

-await fs.rm(testDir, { recursive: true })
+await fs.rm(testDir, { recursive: true, force: true })

Memory investigation without libp2p changes

I ended up creating some heapsnapshots and ran them through https://github.com/facebook/memlab

memlab analyze output

╰─ ✘ 1 ❯ memlab analyze unbound-object --snapshot-dir heapsnapshots
Top growing objects in sizes:
 (Use `memlab trace --node-id=@ID` to get trace)

· PQueue [object](@284121):  568 bytes > 231.9MB > 219.1MB
· PriorityQueue [object](@284203):  128 bytes > 231.9MB > 219.1MB
· Array [object](@284207):  32 bytes > 231.9MB > 219.1MB
· PQueue [object](@214103):  568 bytes > 58.1MB > 114.6MB
· PriorityQueue [object](@214111):  128 bytes > 58.1MB > 114.6MB
· Array [object](@214137):  32 bytes > 58.1MB > 114.6MB
· system / Context [object](@155445):  38.5KB > 3.3MB > 3.3MB
· Array [object](@233775):  256 bytes > 3.2MB > 3.2MB
· PeerQueue [object](@284025):  1.1KB > 1.3KB > 2.9MB
· Array [object](@284093):  32 bytes > 184 bytes > 2.9MB
· system / Context [object](@175491):  539.4KB > 539.9KB > 540.1KB
· ChaCha20Poly1305 [object](@299017):  529.3KB > 529.8KB > 529.8KB
· Queue [object](@214677):  1.8KB > 682.4KB > 263.2KB
· Array [object](@214745):  32 bytes > 680.5KB > 261.2KB
· BuiltinModule [closure](@71631):  173.2KB > 178.7KB > 178.7KB
· Map [object](@135421):  172.2KB > 177.8KB > 177.8KB
· system / Context [object](@93967):  2.2KB > 132.2KB > 132.2KB
· WeakMap [object](@94423):  1KB > 131.1KB > 131.1KB
· Providers [object](@283933):  5.2KB > 37.8KB > 127.9KB
· Object { clear, get, has, remove, set } [object](@284119):  872 bytes > 34KB > 124KB

memlab find-leaks output

╰─ ✘ 1 ❯ memlab find-leaks --baseline heapsnapshots/1after-creating-unixfs.heapsnapshot --target heapsnapshots/2after-import.heapsnapshot --final heapsnapshots/3after-waiting.heapsnapshot
Alive objects allocated in target page:
┌─────────┬─────────────────────┬───────────┬────────┬──────────────┐
│ (index) │        name         │   type    │ count  │ retainedSize │
├─────────┼─────────────────────┼───────────┼────────┼──────────────┤
│    0    │      'Promise'      │ 'object'  │ 47836  │  '674.3MB'   │
│    1    │       'Array'       │ 'object'  │ 781640 │  '422.6MB'   │
│    2    │     'Generator'     │ 'object'  │ 23229  │  '240.7MB'   │
│    3    │      'Object'       │ 'object'  │ 28527  │  '226.4MB'   │
│    4    │      'provide'      │ 'object'  │  7462  │  '225.6MB'   │
│    5    │         ''          │ 'closure' │ 106503 │   '220MB'    │
│    6    │     'Multiaddr'     │ 'object'  │ 78062  │  '156.4MB'   │
│    7    │    'ArrayBuffer'    │ 'object'  │ 182772 │   '74.4MB'   │
│    8    │    'Uint8Array'     │ 'object'  │ 376370 │   '54.5MB'   │
│    9    │        'Map'        │ 'object'  │  6004  │   '7.2MB'    │
│   10    │    'EventTarget'    │ 'object'  │  2255  │   '6.2MB'    │
│   11    │        'Job'        │ 'object'  │  321   │   '5.1MB'    │
│   12    │        'CID'        │ 'object'  │  3827  │   '2.9MB'    │
│   13    │   'JobRecipient'    │ 'object'  │  933   │   '2.9MB'    │
│   14    │      'Digest'       │ 'object'  │  9113  │   '2.6MB'    │
│   15    │  'AsyncGenerator'   │ 'object'  │  1426  │   '2.4MB'    │
│   16    │    'YamuxMuxer'     │ 'object'  │   57   │   '2.2MB'    │
│   17    │   'DOMException'    │ 'object'  │  584   │    '2MB'     │
│   18    │ 'Ed25519PeerIdImpl' │ 'object'  │  5015  │    '2MB'     │
│   19    │    'YamuxStream'    │ 'object'  │  147   │   '1.9MB'    │
└─────────┴─────────────────────┴───────────┴────────┴──────────────┘
No leaks found
MemLab found 0 leak(s)
Number of clusters loaded: 0

diff of changes to https://github.com/saul-jb/helia-memory-leak

diff --git a/src/import.ts b/src/import.ts
index b314212..0695728 100644
--- a/src/import.ts
+++ b/src/import.ts
@@ -11,20 +11,24 @@ import { createHelia } from 'helia'
 import { fixedSize } from 'ipfs-unixfs-importer/chunker'
 import all from 'it-all'
 import createFile from './generate-file.js'
+import { writeHeapSnapshot } from 'v8'
 
 export const packagePath = Path.join(Path.dirname(fileURLToPath(import.meta.url)), '../../')
 
 events.setMaxListeners(40)
 
 const testDir = Path.join(packagePath, 'test-out')
+const heapsnapshotDir = Path.join(packagePath, 'heapsnapshots')
 
 // Make sure the directory is empty before starting.
 try {
-  await fs.rm(testDir, { recursive: true })
+  await fs.rm(testDir, { recursive: true, force: true })
+  await fs.rm(heapsnapshotDir, { recursive: true, force: true })
 } catch (error) {
   // Ignore
 } finally {
   await fs.mkdir(testDir)
+  await fs.mkdir(heapsnapshotDir)
 }
 
 const filePath = Path.join(testDir, 'file.data')
@@ -38,14 +42,17 @@ const helia = await createHelia({
 
 const ufs = unixfs(helia)
 
+writeHeapSnapshot(`${heapsnapshotDir}/after-creating-unixfs.heapsnapshot`)
 console.log('memory usage (baseline):', process.memoryUsage())
 
 await all(ufs.addAll(globSource(testDir, '*'), { chunker: fixedSize() }))
 
+writeHeapSnapshot(`${heapsnapshotDir}/after-import.heapsnapshot`)
 console.log('memory usage (after import):', process.memoryUsage())
 
 await new Promise(resolve => setTimeout(resolve, 30000))
 
+writeHeapSnapshot(`${heapsnapshotDir}/after-waiting.heapsnapshot`)
 console.log('memory usage (after waiting):', process.memoryUsage())
 
 await helia.stop()

@SgtPooki SgtPooki added need/author-input Needs input from the original author and removed help wanted Seeking public contribution on this issue labels Aug 9, 2024
@saul-jb
Copy link
Contributor Author

saul-jb commented Aug 14, 2024

I've done more testing and it seems that the issue is now fixed after the latest package releases, I couldn't find what package/update was responsible for fixing this. (The package versions in the repo I linked still have the issue.)

I don't think that libp2p was to blame because the increase in memory was much to large for Libp2p and correlated with the file size.

Regardless, thanks for taking the time to look into this.

@saul-jb saul-jb closed this as completed Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/author-input Needs input from the original author
Projects
None yet
Development

No branches or pull requests

2 participants