Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase file limit when hit the ulimit #1363

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

karuboniru
Copy link

@karuboniru karuboniru commented Oct 21, 2024

Fixes #1362 #851

It seems that with llvm lto we don't have much way to fix the issue than increasing the ulimit. (I strace'd bfd and it is calling setprlimit when it gets any EMFILE)

@rui314
Copy link
Owner

rui314 commented Oct 21, 2024

Well I don't think any program should automagically increase the ulimit. Can you file a bug to LLVM instead? If GCC can handle the same thing without hitting the ulimit, LLVM should be able to do the same thing without hitting it, too.

@karuboniru
Copy link
Author

karuboniru commented Oct 21, 2024

If GCC can handle the same thing without hitting the ulimit, LLVM should be able to do the same thing without hitting it, too.

Stracing ld.bfd with llvm lto shows it is calling setprlimit. Maybe I can try to see what happens with lld.

$ strace -f ld.bfd @link.txt |& grep EMFILE -A5 -B5
lseek(6, 4096, SEEK_SET)                = 4096
lseek(6, 4096, SEEK_SET)                = 4096
lseek(6, 4096, SEEK_SET)                = 4096
lseek(6, 4096, SEEK_SET)                = 4096
lseek(6, 4096, SEEK_SET)                = 4096
openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4XNNTotal.cc.o", O_RDONLY) = -1 EMFILE (打开的文件过多)
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=512*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, {rlim_cur=512*1024, rlim_max=512*1024}, NULL) = 0
openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4XNNTotal.cc.o", O_RDONLY) = 1024
fstat(1024, {st_mode=S_IFREG|0644, st_size=161208, ...}) = 0
mmap(NULL, 161208, PROT_READ, MAP_PRIVATE, 1024, 0) = 0x7fc465ba7000

LLD seems to follow the pattern of [open] -> [mmap] -> [close] without keeping too mant fds.

openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4CollisionNNToNDelta1950.cc.o", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 225128, PROT_READ, MAP_PRIVATE|MAP_NORESERVE, 3, 0) = 0x7fd7caa50000
close(3)                                = 0
openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4CollisionNNToNDeltastar.cc.o", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 235624, PROT_READ, MAP_PRIVATE|MAP_NORESERVE, 3, 0) = 0x7fd7caa16000
close(3)                                = 0
openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4CollisionNNToNNstar.cc.o", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 583212, PROT_READ, MAP_PRIVATE|MAP_NORESERVE, 3, 0) = 0x7fd7ca987000
close(3)                                = 0

I could imagine the issue can't be easily fixed on llvm side, as changing the ownership of fd would be a breaking change?

@rui314
Copy link
Owner

rui314 commented Oct 22, 2024

I just tried LTO with Clang, and it seems it works just fine without closing the FDs. Can you try again with the following change?

diff --git a/src/lto-unix.cc b/src/lto-unix.cc
index 7d839c87..f009478e 100644
--- a/src/lto-unix.cc
+++ b/src/lto-unix.cc
@@ -637,13 +637,8 @@ ObjectFile<E> *read_lto_object(Context<E> &ctx, MappedFile *mf) {
                << " please make sure you are using the same compiler of the"
                << " same version for all object files";

-  // It looks like GCC doesn't need fd after claim_file_hook() while
-  // LLVM needs it and takes the ownership of fd. To prevent "too many
-  // open files" issue, we close fd only for GCC. This is ugly, though.
-  if (!is_llvm(ctx)) {
-    MappedFile *mf2 = mf->parent ? mf->parent : mf;
-    mf2->close_fd();
-  }
+  MappedFile *mf2 = mf->parent ? mf->parent : mf;
+  mf2->close_fd();

   // Create a symbol strtab
   i64 strtab_size = 1;

@karuboniru
Copy link
Author

karuboniru commented Oct 22, 2024

I tried the same and the link succeed, however the code for LLVMGOLD suggested that it is using the fd information to track the orgin of file if multiple "handle" is from the same file. I believe it will cause the backend treat all object file as they have same source.

Sorry but I know nearly nothing about how thinLTO works, so I don't know if this will case any potenial issue.

https://github.com/llvm/llvm-project/blob/6e0b0038cd65ce726ce404305a06e1cf33e36cca/llvm/tools/gold/gold-plugin.cpp#L580-L585

@rui314
Copy link
Owner

rui314 commented Oct 23, 2024

Do you mind if I ask you to try to fix the issue on the LLVM plugin side rather than on the linker side? If you don't have enough bandwidth, I can take a look later. But it seems you have already started reading the source code of the plugin.

@karuboniru
Copy link
Author

I've no idea how should I test a change in llvm, so I'd appreciate it if you can take time fix the issue in LLVM side.

But from my understanding of the code in llvmgold I could tell the leading handle thing only wants uniqueness and this could be done by using file path instead of FD. (since if we close the file the fd will no longer be unique)

With the change the FD could be closed after https://github.com/llvm/llvm-project/blob/6e0b0038cd65ce726ce404305a06e1cf33e36cca/llvm/tools/gold/gold-plugin.cpp#L546

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potenial fd leak with clang lto and cause too many open files
2 participants