Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openssf-compiler-options: -Wl,-z,now causes nvidia-device-plugin to fail to load #34568

Open
dannf opened this issue Nov 19, 2024 · 9 comments
Open
Labels
openssf-compiler-options Track adding openssf-compiler-options upstream

Comments

@dannf
Copy link
Contributor

dannf commented Nov 19, 2024

I'm reporting this per https://github.com/orgs/wolfi-dev/discussions/33052.

I found that while a rebuild of nvidia-device-plugin w/ openssf-compiler-flags succeeds, the tests will fail:

2024/11/19 00:38:19 WARN + nvidia-device-plugin --version
2024/11/19 00:38:19 WARN nvidia-device-plugin: symbol lookup error: nvidia-device-plugin: undefined symbol: nvmlGpuInstanceGetComputeInstanceProfileInfoV

nvidia-device-plugin-build-and-test-fail.txt
nvidia-device-plugin-no-rebuild-test-ok.txt

@dannf dannf added the openssf-compiler-options Track adding openssf-compiler-options label Nov 19, 2024
dannf added a commit to dannf/os that referenced this issue Nov 19, 2024
@tuananh
Copy link
Contributor

tuananh commented Nov 19, 2024

could have just change it to -Wl,-z,lazy and it should work.

i guess that this pkg need to link with libnvidia-ml on the host, that's why lazy is being used.

dannf added a commit that referenced this issue Nov 19, 2024
@dannf
Copy link
Contributor Author

dannf commented Nov 19, 2024

could have just change it to -Wl,-z,lazy and it should work.

Yeah, that did work when I hacked it onto the end of the options in openssf.spec - but I didn't identify a way to do it with build commands/environment variables. Neither LDFLAGS nor CGO_LDFLAGS did the trick.

i guess that this pkg need to link with libnvidia-ml on the host, that's why lazy is being used.

That's right.

@tuananh
Copy link
Contributor

tuananh commented Nov 19, 2024

ah i think it's because we use make to build (https://github.com/NVIDIA/k8s-device-plugin/blob/main/Makefile) . maybe convert this to use go/build and then we can use ldflags override in the pipeline

https://github.com/chainguard-dev/melange/blob/main/pkg/build/pipelines/go/build.yaml

@dannf
Copy link
Contributor Author

dannf commented Nov 19, 2024

ah i think it's because we use make to build (https://github.com/NVIDIA/k8s-device-plugin/blob/main/Makefile) . maybe convert this to use go/build and then we can use ldflags override in the pipeline

https://github.com/chainguard-dev/melange/blob/main/pkg/build/pipelines/go/build.yaml

Thanks @tuananh. That would provide a hook for passing a clean -ldflags, but the problem persists:

[...]
  - uses: go/build
    with:
      packages: ./cmd/nvidia-device-plugin
      ldflags: -extldflags="-Wl,-z,lazy"
      output: test
  - runs: |
      exit 1
$ make debug/nvidia-device-plugin
[...]
2024/11/19 21:42:52 INFO running step "go/build"
^[[F2024/11/19 21:43:03 ERRO Step failed: exit status 1
/bin/sh -c set -e 
[ -d '/home/build' ] || mkdir -p '/home/build'
cd '/home/build'
exit 1

exit 0
2024/11/19 21:43:03 INFO Execing into pod "" to debug interactively. workdir=/home/build
2024/11/19 21:43:03 INFO Type 'exit 0' to continue the next pipeline step or 'exit 1' to abort.
~ $ ./melange-out/nvidia-device-plugin/usr/bin/test 
./melange-out/nvidia-device-plugin/usr/bin/test: symbol lookup error: ./melange-out/nvidia-device-plugin/usr/bin/test: undefined symbol: nvmlGpuInstanceGetComputeInstanceProfileInfoV

The spec-defined options just seem to be super sticky.

@tuananh
Copy link
Contributor

tuananh commented Nov 20, 2024

yeah i tried it too and it didnt work.

@xnox
Copy link
Member

xnox commented Nov 22, 2024

@dannf you can quickly escape all hardening by setting environment:environment: GCC_SPEC_FILE: /dev/zero

To keep all hardening, but use lazy binding do this:

  1. Modify ./usr/lib/gcc/*/*/openssf.spec as follows:
  2. remove ,-z,now
  3. add %{!Wl,-z,now:%{!Wl,-z,lazy:-Wl,-z,now}} similar to how the long O command looks like
  4. bump / rebuild openssf-compiler-options.yaml
  5. in nvidia-device-plugin.yaml environment:envrionment: set "LDFLAGS: -Wl,-z,lazy"

The second step hopefully means "if -z now was not specified, and if -z lazy was not specified, add -z now". Such that if one manually specified either now or lazy on the commandline it wins, and the spec file doesn't add or do anything, creating an opt in.

@tuananh
Copy link
Contributor

tuananh commented Nov 22, 2024

yeah i think it's better to keep all hardening but only do lazy binding.

dannf added a commit to dannf/os that referenced this issue Nov 22, 2024
…ev#34569)

Until `abseil-cpp-compiler-options` is available (which is blocked
by a melange/apko issue), let's disable openssf-compiler-options
to unblock building this package.

Related: wolfi-dev#34568

Signed-off-by: dann frazier <[email protected]>
dannf added a commit to dannf/os that referenced this issue Nov 22, 2024
…ev#34569)

Until `abseil-cpp-compiler-options` is available (which is blocked
by a melange/apko issue), let's disable openssf-compiler-options
to unblock building this package.

Related: wolfi-dev#34568

Signed-off-by: dann frazier <[email protected]>
@xnox
Copy link
Member

xnox commented Nov 22, 2024

i wonder if this is a gcc bug!

@xnox
Copy link
Member

xnox commented Nov 22, 2024

Filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117739

Added test case at https://github.com/wolfi-dev/os/pull/35016/files it appears that it is not possible to keep using -fhardened and able to turn-off immediate binding.

@xnox xnox added the upstream label Nov 22, 2024
dannf added a commit to dannf/os that referenced this issue Nov 26, 2024
Until `abseil-cpp-compiler-options` is available (which is blocked
by a melange/apko issue), let's disable openssf-compiler-options
to unblock building this package.

Related: wolfi-dev#34568

Signed-off-by: dann frazier <[email protected]>
dannf added a commit to dannf/os that referenced this issue Nov 26, 2024
Until `abseil-cpp-compiler-options` is available (which is blocked
by a melange/apko issue), let's disable openssf-compiler-options
to unblock building this package.

Related: wolfi-dev#34568

Signed-off-by: dann frazier <[email protected]>
dannf added a commit that referenced this issue Nov 26, 2024
Until `abseil-cpp-compiler-options` is available (which is blocked
by a melange/apko issue), let's disable openssf-compiler-options
to unblock building this package.

Related: #34568

Signed-off-by: dann frazier <[email protected]>
dannf added a commit that referenced this issue Nov 26, 2024
Until `abseil-cpp-compiler-options` is available (which is blocked
by a melange/apko issue), let's disable openssf-compiler-options
to unblock building this package.

Related: #34568

Signed-off-by: dann frazier <[email protected]>
@xnox xnox mentioned this issue Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openssf-compiler-options Track adding openssf-compiler-options upstream
Projects
None yet
Development

No branches or pull requests

3 participants