Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kselftest build failures on armhf #732

Closed
crazoes opened this issue Jul 26, 2024 · 25 comments
Closed

Kselftest build failures on armhf #732

crazoes opened this issue Jul 26, 2024 · 25 comments
Assignees

Comments

@crazoes
Copy link

crazoes commented Jul 26, 2024

Some of the builds are failing for kselftest in various stable-rc branches :-

  INSTALL ./usr/include
make[2]: Leaving directory '/tmp/kci/linux'
make[2]: Entering directory '/tmp/kci/linux/tools/testing/selftests/arm64'
make[2]: Leaving directory '/tmp/kci/linux/tools/testing/selftests/arm64'
make[2]: Entering directory '/tmp/kci/linux/tools/testing/selftests/breakpoints'
arm-linux-gnueabihf-gcc     step_after_suspend_test.c  -o /tmp/kci/linux/tools/testing/selftests/breakpoints/step_after_suspend_test
In file included from /usr/include/features.h:392,
                 from /usr/include/errno.h:25,
                 from step_after_suspend_test.c:8:
/usr/include/features-time64.h:20:10: fatal error: bits/wordsize.h: No such file or directory
   20 | #include <bits/wordsize.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: Leaving directory '/tmp/kci/linux/tools/testing/selftests/breakpoints'
make[2]: *** [../lib.mk:151: /tmp/kci/linux/tools/testing/selftests/breakpoints/step_after_suspend_test] Error 1
make[2]: Entering directory '/tmp/kci/linux/tools/testing/selftests/capabilities'
arm-linux-gnueabihf-gcc -O2 -g -std=gnu99 -Wall     test_execve.c -lcap-ng -lrt -ldl -o /tmp/kci/linux/tools/testing/selftests/capabilities/test_execve
In file included from /usr/lib/gcc-cross/arm-linux-gnueabihf/12/include/stdint.h:9,
                 from /usr/include/cap-ng.h:26,
                 from test_execve.c:4:
/usr/include/stdint.h:26:10: fatal error: bits/libc-header-start.h: No such file or directory
   26 | #include <bits/libc-header-start.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: Leaving directory '/tmp/kci/linux/tools/testing/selftests/capabilities'
make[2]: *** [../lib.mk:151: /tmp/kci/linux/tools/testing/selftests/capabilities/test_execve] Error 1
make[2]: Entering directory '/tmp/kci/linux/tools/testing/selftests/cgroup'
arm-linux-gnueabihf-gcc -Wall -pthread     test_memcontrol.c cgroup_util.c ../clone3/clone3_selftests.h  -o /tmp/kci/linux/tools/testing/selftests/cgroup/test_memcontrol
In file included from /usr/include/features.h:392,
                 from /usr/include/fcntl.h:25,
                 from test_memcontrol.c:6:
/usr/include/features-time64.h:20:10: fatal error: bits/wordsize.h: No such file or directory
   20 | #include <bits/wordsize.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.

Kernel build is successful but we see lots of errors for the kselftest build. Example of one of the jobs :-

https://fresh-kcidb-grafana-m6io3uhhiq-uc.a.run.app/d/build/build?orgId=1&var-datasource=default&var-origin=maestro&var-build_architecture=arm&var-build_config_name=vexpress_defconfig&var-id=maestro:66a223adbb1dfd36a921d5fd

@crazoes crazoes converted this from a draft issue Jul 26, 2024
@crazoes crazoes moved this from Todo to In Progress in KernelCI results data evaluation Jul 29, 2024
@musamaanjum
Copy link
Contributor

I've tried to reproduce it locally. But it builds fine. It means some package is missing from build rootfs. I'll try to reproduce with kernel rootfs and try to add the missing package there.

@musamaanjum
Copy link
Contributor

The job is building the kernel with vexpress_defconfig only. In contrast, it should have included the kselftest fragment if it is going to build kselftests for this kernel as well. Hence, possiblely the wrong docker image is being used to build the kselftests which don't have the required packages.

To confirm, I've just launched a docker image that has packages needed for kselftest and there are no errors.
./kci docker build gcc-12 kselftest kernelci --arch=arm --prefix=my-stuff/ --verbose

Next, I need to debug why this job was wrongfully created to build the kselftests and which docker image was used.

@musamaanjum
Copy link
Contributor

I've made fixes to the docker image name for arm builds. (1) Let's wait to get updated runs. Hopefully, the build issue will be solved.

@crazoes
Copy link
Author

crazoes commented Aug 1, 2024

@musamaanjum any further progress on this? We have many builds in KernelCI which are failing because of this.

@musamaanjum
Copy link
Contributor

I've been waiting for the results. I'm looking at the build results now. It seems there are some different configs for arm on which the same issue has occured. The issue is on kernelci side. In case you want to send a report, is it possible to ignore them until we fix them?

@musamaanjum
Copy link
Contributor

Posting debugging discussion with @JenySadadia here.

usama.anjum
The build of kernel succeeded. But kselftests failed.
https://grafana.kernelci.org/d/build/build?orgId=1&var-datasource=default&var-origin=maestro&var-build_architecture=arm&var-build_config_name=omap2plus_defconfig&var-id=maestro:66aa0bc3bb1dfd36a92e0373
Can you find on which docker container this build is running?

jenysadadia
I don't see any kselftest job for build https://kernelci-api.westus3.cloudapp.azure.com/viewer?node_id=66aa0bc3bb1dfd36a92e0373

usama.anjum
The kselftests are being built after building kernel and modules.

From the logs: (https://kciapistagingstorage1.file.core.windows.net/early-access/kbuild-gcc-12-arm-omap2plus_defconfig-66aa0bc3bb1dfd36a92e0373/build.log.gz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D)
-----log:build_kselftest-----

echo job:build_kselftest=running

So this is what I was thinking the configuration is omap2plus_defconfig which doesn't have kselftest fragment. Why would it try to build kselftest.

@JenySadadia
Copy link
Collaborator

Looking at the maestro DB data https://kernelci-api.westus3.cloudapp.azure.com/viewer?node_id=66aa0bc3bb1dfd36a92e0373, the build job actually passed.
I am not sure why it's been reported with FAIL result.
Also, node_timeout error message looks weird.

@musamaanjum
Copy link
Contributor

Unsolved questions:

  1. The configuration is omap2plus_defconfig which doesn't have a kselftest fragment. Why would it try to build kselftest which produce errors? Also I don't see any kselftest job for build https://kernelci-api.westus3.cloudapp.azure.com/viewer?node_id=66aa0bc3bb1dfd36a92e0373
  2. Why is a job reported as Failed when masestro DB data says it passed?

Let's look at and solve it tomorrow as it is critical issue.

@JenySadadia
Copy link
Collaborator

@spbnick Is it possible to get the submission timestamp for the above mentioned result?

@JenySadadia
Copy link
Collaborator

@nuclearcat This result is from the production instance. I believe we don't have any log dump for it.
Is it possible to check k8s logs somehow?

@JenySadadia
Copy link
Collaborator

@musamaanjum

The configuration is omap2plus_defconfig which doesn't have a kselftest fragment. Why would it try to build kselftest which produce errors? Also I don't see any kselftest job for build https://kernelci-api.westus3.cloudapp.azure.com/viewer?node_id=66aa0bc3bb1dfd36a92e0373

The job kbuild-gcc-12-arm-omap2plus_defconfig uses configs https://github.com/kernelci/kernelci-pipeline/blob/main/config/pipeline.yaml#L307 with docker image kernelci/staging-gcc-12:arm-kselftest-kernelci.
The image has kselftest fragment enabled. Maybe that's the reason here?

@nuclearcat
Copy link
Member

image
According to my data at that day we had a lot of builds scheduled, many of them was staying up to 6-7 hours in queue(as we had insufficient build capacity), so it might hit timeout naturally. We might need to rework whole node timeout concept, cause it doesn't represent reality in some causes, like this situation.
Maybe if node reaching timeout, we need to poll, if job submitted to k8s cluster still exist and we just need to extend timeout.

@JenySadadia
Copy link
Collaborator

OK, but with node timeout, we set node result to incomplete. And this test would be sent with ERROR status to KCIDB.
The issue is the KCIDB received it with FAIL status instead. We need to have KCIDB bridge service logs to verify the submission data.

@nuclearcat
Copy link
Member

Next week i will work on enabling logs volume for our API instance.

@musamaanjum
Copy link
Contributor

musamaanjum commented Aug 2, 2024

The configuration is omap2plus_defconfig which doesn't have a kselftest fragment. Why would it try to build kselftest which produce errors? Also I don't see any kselftest job for build https://kernelci-api.westus3.cloudapp.azure.com/viewer?node_id=66aa0bc3bb1dfd36a92e0373

The job kbuild-gcc-12-arm-omap2plus_defconfig uses configs https://github.com/kernelci/kernelci-pipeline/blob/main/config/pipeline.yaml#L307 with docker image kernelci/staging-gcc-12:arm-kselftest-kernelci. The image has kselftest fragment enabled. Maybe that's the reason here?

Yes, that seems like the reason. I'll try this exact docker container to see if the build works fine. Previously I'd used the following to generate the container and then ran the kselftest build which had worked fine.

@musamaanjum
Copy link
Contributor

I've reproduced the errors. There are a lot of inconsistencies for kselftests builds on the arm. I've started fixing these one by one.

@crazoes
Copy link
Author

crazoes commented Aug 5, 2024

Thanks @musamaanjum for working on this. Just FYI, I just added this as an example but there are other similar failures seen as well. It will be great if you can look into grafana dashboard and try to fix them. Feel free to create more github issues in the Data Evaluation dashboard.

@musamaanjum
Copy link
Contributor

Yeah, we'll have to fix the bugs from next and then fixes would be ported back. We may have to port some failures manually to stable as well. I'll keep you posted.

@musamaanjum
Copy link
Contributor

musamaanjum commented Aug 7, 2024

Following is the list of errors on linux-next:

  1. I've sent a patch for it.
make[1]: Entering directory '/home/kernelci/tools/testing/selftests/kvm'
mkdir: missing operand
Try 'mkdir --help' for more information.
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/kernelci/tools/testing/selftests/kvm'
  1. rcrt1.o is missing. Some package is missing or toolchain doesn't have support for it. discussion
arm-linux-gnueabihf-gcc -Wall -Wno-nonnull -D_GNU_SOURCE=   -Wl,-z,max-page-size=0x1000 \
	-fPIE -static-pie load_address.c -o /home/kernelci/tools/testing/selftests/exec/load_address.static.0x1000
/usr/lib/gcc-cross/arm-linux-gnueabihf/12/../../../../arm-linux-gnueabihf/bin/ld: cannot find rcrt1.o: No such file or directory
collect2: error: ld returned 1 exit status
  1. The openssl package is missing for armhf. (Created PR: config: docker: net kselftest suite required openssl library kernelci-core#2627)
  CC       tcp_mmap
In file included from /usr/include/openssl/pem.h:14,
                 from tcp_mmap.c:69:
/usr/include/openssl/macros.h:14:10: fatal error: openssl/opensslconf.h: No such file or directory
   14 | #include <openssl/opensslconf.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~
  1. Sent fix
  CC       resctrl_tests
In file included from resctrl.h:24,
                 from cat_test.c:11:
In function 'arch_supports_noncont_cat',
    inlined from 'noncont_cat_run_test' at cat_test.c:323:6:
../kselftest.h:74:9: error: impossible constraint in 'asm'
   74 |         __asm__ __volatile__ ("cpuid\n\t"                               \
      |         ^~~~~~~
cat_test.c:301:17: note: in expansion of macro '__cpuid_count'
  301 |                 __cpuid_count(0x10, 1, eax, ebx, ecx, edx);
      |                 ^~~~~~~~~~~~~
../kselftest.h:74:9: error: impossible constraint in 'asm'
   74 |         __asm__ __volatile__ ("cpuid\n\t"                               \
      |         ^~~~~~~
cat_test.c:303:17: note: in expansion of macro '__cpuid_count'
  303 |                 __cpuid_count(0x10, 2, eax, ebx, ecx, edx);
      |                 ^~~~~~~~~~~~~
  1. sent fix v2
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function); did you mean 'memfd_secret'?
   42 |         return syscall(__NR_memfd_secret, flags);
      |                        ^~~~~~~~~~~~~~~~~
      |                        memfd_secret
  1. sent fix
mseal_test.c: In function 'sys_mmap':                                                                                                                                                                                                         
mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function)                                                                                                                                                                
   90 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,                                                                                                                                                                           
      |                                 ^~~~~~~~~                                                                                                                                                                                             
mseal_test.c:90:33: note: each undeclared identifier is reported only once for each function it appears in                                                                                                                                    
  CC       seal_elf                                                                                                                                                                                                                           
seal_elf.c: In function 'sys_mmap':                                                                                                                                                                                                           
seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function)                                                                                                                                                                  
   39 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,                                                                                                                                                                           
      |                                 ^~~~~~~~~                                                                                                                                                                                             

There are several warnings as well. Let's focus on errors first.

@musamaanjum musamaanjum changed the title Kselftest build failures Kselftest build failures on armhf Aug 7, 2024
@musamaanjum
Copy link
Contributor

There is a long-standing issue about requiring clang in gcc build images. I've created separate issue for that so that we don't miss it again: kernelci/kernelci-project#431

@musamaanjum
Copy link
Contributor

  1. I've sent a patch for it.
  2. rcrt1.o is missing. Some package is missing or toolchain doesn't have support for it. discussion
  3. The openssl package is missing for armhf. (Created PR: config: docker: net kselftest suite required openssl library kernelci-core#2627)
  4. Sent fix
  5. sent fix
  6. sent fix

I'm sharing the short summary of the sent patches and PRs here. These fixes will go to Linux-next first and then we'll port them to stable if they don't get applied automatically. Most of the fixes wouldn't apply cleanly to the stable kernels as kselftests have evolved a lot and it wouldn't be easy to port patches back because of dependence trees.

@musamaanjum
Copy link
Contributor

musamaanjum commented Aug 16, 2024

  1. kvm: Maintainer will fix it
  2. rcrt1.o: on-going discussion
  3. openssl: Merged
  4. resctrl: Close to getting accepted
  5. memfd_secret: Accepted
  6. mseal: Accepted

@crazoes
Copy link
Author

crazoes commented Sep 16, 2024

@musamaanjum did we manage to fix all the kselfest build failures?

@musamaanjum
Copy link
Contributor

It seems like all the previous errors were fixed. But we need to check individual stable kernels to see if fixes got ported back and there aren't any more errors.

@musamaanjum
Copy link
Contributor

Closing for now. I'll open individual issues as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

6 participants