Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mass ccache-related build errors for AT1 PR builds PR builds gnu-8.5.0-openmpi-4.1.6 and cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6 on ascic0193 starting 2025-03-03 #13853

Open
bartlettroscoe opened this issue Mar 4, 2025 · 4 comments
Labels
type: bug The primary issue is a bug in Trilinos code or tests

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Mar 4, 2025

CC: @trilinos/framework, @sebrowne, @achauphan, @ccober6

Description

As shown in this CDash query showing:

Image

there are mass build errors for the build configurations:

  • rhel8_sems-gnu-8.5.0-openmpi-4.1.6-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
  • rhel8_sems-cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables

all on the machine ascic0193 starting 2025-03-03.

This impacted many PR build iterations for different PRs #13847, #13849, #13850, #13851, and #13852 so far.

This is showing errors like:

ccache: error: Failed to create directory /fgs/trilinos/ccache/cache/1/5: Permission denied
@bartlettroscoe bartlettroscoe added the type: bug The primary issue is a bug in Trilinos code or tests label Mar 4, 2025
@bartlettroscoe bartlettroscoe pinned this issue Mar 4, 2025
@bartlettroscoe
Copy link
Member Author

FYI: I pinned this issue to warn other Trilinos developers trying to get their PRs to pass testing.

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Mar 4, 2025

Let's see how many people manually add AT: RETEST before they realize the futility of doing so before this issue is resolved?

@bartlettroscoe bartlettroscoe changed the title Mass build errors for AT1 PR builds PR builds gnu-8.5.0-openmpi-4.1.6 and cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6 on ascic0193 starting 2025-03-03 Mass ccache-related build errors for AT1 PR builds PR builds gnu-8.5.0-openmpi-4.1.6 and cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6 on ascic0193 starting 2025-03-03 Mar 4, 2025
@sebrowne
Copy link
Contributor

sebrowne commented Mar 4, 2025

Thanks for noticing this, I was in meetings most of the day and just saw it. Ticket has been submitted, and I removed 0193 from the pool of test resources. Anything starting from now on is safe.

@sebrowne
Copy link
Contributor

sebrowne commented Mar 5, 2025

Admins have restored the bad filesystem, and I just re-added ascic0193 to our CI pool. Should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

2 participants