Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-9707: chore(tests/containers): check shared objects with ldd #871

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jiridanek
Copy link
Member

@jiridanek jiridanek commented Jan 29, 2025

This sanity check will report any ELF files with dynamic linking that have unsatisfied dependencies.

  • apply the "LD_LIBRARY_PATH": os.path.dirname(dlib) fix/workaround

cuda image already has some LD_LIBRARY_PATH, this is to be preserved

(app-root) sh-5.1$ echo $LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64
  • use subtests to handle multiple libs not found

  • skip known issue and false positives

Description

Running ldd one by one on every file is quite lengthy on macOS, due to cross-architecture emulation

(app-root) bash-5.1$ time bash -c 'i=0; while [[ $i -lt 100 ]]; do ldd /bin/bash >/dev/null; i=$((i+1)); done'

real    0m5.684s
user    0m4.527s
sys     0m0.795s

compare that with native linux vm, without this cross-arch translation

[jdanek@lima-default workbenches]$ time bash -c 'i=0; while [[ $i -lt 100 ]]; do ldd /bin/bash >/dev/null; i=$((i+1)); done'

real    0m0.220s
user    0m0.130s
sys     0m0.061s

Since there's 1000 to 2000 dynamic libs to check in each of our images, the check on a macOS when using ldd can take 70 or more seconds.

When mentioned on a daily meeting, ppl weren't very concerned, so I'm not going to optimize this now.

For the future, already tried out one approach, using the debug/elf package in Go's standard library to simply read the DT_NEEDED field out of the libraries and check that. It's quite a lot of code, but it seems to work, and work fast.
check_elf_file_2.go.txt

Alternatively, it should be possible to just go around loading the dlls from the python script directly, to see if it loads or not

            from ctypes import CDLL
            try:
                CDLL(lib_path)
                print(f"Successfully loaded: {lib}")
            except Exception as e:
                print(f"Error loading {lib}: {e}")

Question is whether binaries can be casually dlopened like this. I did this in skupper-router (dlopening the current executable ;) so it is possible sometimes, just don't know if always; worry that binary has to be compiled with some special options.

How Has This Been Tested?

Checked all what it found by hand and populated an allowlist to suppress false positives.

Over all, the usefulness of this check will be best seen when we do version upgrades or some significant version upgrades for 2025.a images. Is this going to be full of false positives for trivialities, or is it going to save us from dll hell? So far it found a true positive https://issues.redhat.com/browse/RHOAIENG-18904, but I do worry that the maintenance of the allowlist is not going to be worth it.

Possibly running import xyz for every python package we have is more useful. Adding to that small helloworld script to check functionality would be even more powerful.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

This sanity check will report any ELF files with dynamic linking that have unsatisfied dependencies.

* apply the `"LD_LIBRARY_PATH": os.path.dirname(dlib)` fix/workaround

cuda image already has some LD_LIBRARY_PATH, this is to be preserved

```
(app-root) sh-5.1$ echo $LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64
```

* use subtests to handle multiple libs not found

* skip known issue and false positives
@openshift-ci openshift-ci bot requested review from dibryant and jstourac January 29, 2025 07:46
Copy link
Contributor

openshift-ci bot commented Jan 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jiridanek. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added size/l and removed size/l labels Jan 29, 2025
isdirectory = stat.S_ISDIR(s.st_mode)
isfile = stat.S_ISREG(s.st_mode)
executable = bool(s.st_mode & (stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH))
if isdirectory or not executable or not isfile:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should log/print the skipped/ignored files here with the explanation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but for /opt/app-root this will produce a lot of output that nobody in their right mind will have the energy to read through

cc @opendatahub-io/notebook-devs wdyt? do you like your pytest tests to be vvvverbose or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue
with open(dlib, mode='rb') as fp:
magic = fp.read(4)
if magic != b'\x7fELF':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as above - do we want to log/print files like these for the convenience?

@jstourac
Copy link
Member

/lgtm

Copy link
Contributor

openshift-ci bot commented Jan 29, 2025

@jiridanek: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 6c707cb link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants