-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use streak.llnl.gov as runner for GHA #2873
Changes from 67 commits
50a62e4
b491777
e3593b1
3428f15
e6994d1
a485937
318d833
6cf1dbf
9574bbd
3b9d75a
8c8ac20
abde69d
513e7b8
1ddcb6a
c24a25d
0fd6b8b
9d89343
95d6455
1dfd2ee
7f759b4
521ee05
57bad0e
dc94a52
498f96c
eea10cf
cdad4be
1a52d35
ad1e54b
7fa0746
ff2d5b9
67feb37
0f45fc6
0562f01
fafacb5
50c1cf7
38eb766
017d253
1230aa5
7f129e9
ecac931
d935e2f
51b2679
91836a8
6171fee
041501b
b6e142d
c5390d7
e9be587
8074e8f
d4f01ec
5b0557c
2258d31
40630de
838157d
649e4c7
bd7717e
ca66c4a
85aa635
8738cd7
0b5fefe
336519b
a0d23df
9751403
54b474d
f627583
bd0505b
0fb6af5
2996020
346ea5a
52c72ee
2dc1ce7
a25def1
80c2441
931b4e2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,9 @@ on: | |
DOCKER_REPOSITORY: | ||
required: true | ||
type: string | ||
DOCKER_RUN_ARGS: | ||
required: false | ||
type: string | ||
ENABLE_HYPRE: | ||
required: false | ||
type: string | ||
|
@@ -52,22 +55,22 @@ jobs: | |
runs-on: ${{ inputs.RUNS_ON }} | ||
steps: | ||
- name: Checkout Repository | ||
uses: actions/checkout@v3 | ||
uses: actions/checkout@v4.1.1 | ||
with: | ||
submodules: true | ||
lfs: ${{ inputs.BUILD_TYPE == 'integrated_tests' }} | ||
fetch-depth: 1 | ||
|
||
- id: 'auth' | ||
if: ${{ inputs.GCP_BUCKET || inputs.USE_SCCACHE }} | ||
uses: 'google-github-actions/auth@v1' | ||
uses: 'google-github-actions/auth@v2.1.0' | ||
with: | ||
credentials_json: '${{ secrets.GOOGLE_CLOUD_GCP }}' | ||
create_credentials_file: true | ||
|
||
- name: 'Set up Cloud SDK' | ||
if: inputs.GCP_BUCKET | ||
uses: 'google-github-actions/setup-gcloud@v1' | ||
uses: 'google-github-actions/setup-gcloud@v2.1.0' | ||
with: | ||
version: '>= 363.0.0' | ||
|
||
|
@@ -81,6 +84,10 @@ jobs: | |
docker_args=() | ||
script_args=() | ||
|
||
|
||
|
||
docker_args+=(${{ inputs.DOCKER_RUN_ARGS }}) | ||
|
||
COMMIT=${{ github.event.pull_request.head.sha }} | ||
SHORT_COMMIT=${COMMIT:0:7} | ||
script_args+=(--install-dir-basename GEOSX-${SHORT_COMMIT}) | ||
|
@@ -97,7 +104,9 @@ jobs: | |
script_args+=(--data-basename ${DATA_BASENAME}) | ||
|
||
DATA_EXCHANGE_DIR=/mnt/geos-exchange # Exchange folder outside of the container | ||
sudo mkdir -p ${DATA_EXCHANGE_DIR} | ||
if [ ! -d "${DATA_EXCHANGE_DIR}" ]; then | ||
sudo mkdir -p ${DATA_EXCHANGE_DIR} | ||
fi | ||
DATA_EXCHANGE_MOUNT_POINT=/tmp/exchange # Exchange folder inside of the container | ||
docker_args+=(--volume=${DATA_EXCHANGE_DIR}:${DATA_EXCHANGE_MOUNT_POINT}) | ||
script_args+=(--exchange-dir ${DATA_EXCHANGE_MOUNT_POINT}) | ||
|
@@ -110,6 +119,11 @@ jobs: | |
script_args+=(--sccache-credentials $(basename ${GOOGLE_GHA_CREDS_PATH})) | ||
fi | ||
|
||
if [ ${{ inputs.RUNS_ON }} == 'self-hosted' ]; then | ||
RUNNER_CERTIFICATES_DIR=/etc/pki/ca-trust/source/anchors/ | ||
mkdir -p ${GITHUB_WORKSPACE}/certificates | ||
cp ${RUNNER_CERTIFICATES_DIR}/*.crt* ${GITHUB_WORKSPACE}/certificates | ||
fi | ||
# We need to know where the code folder is mounted inside the container so we can run the script at the proper location! | ||
# Since this information is repeated twice, we use a variable. | ||
GITHUB_WORKSPACE_MOUNT_POINT=/tmp/geos | ||
|
@@ -126,11 +140,19 @@ jobs: | |
docker_args+=(-e ENABLE_HYPRE_DEVICE=${ENABLE_HYPRE_DEVICE:-CPU}) | ||
docker_args+=(-e ENABLE_TRILINOS=${ENABLE_TRILINOS:-ON}) | ||
|
||
docker_args+=(--cap-add=SYS_PTRACE) | ||
docker_args+=(--cap-add=SYS_PTRACE --rm) | ||
|
||
script_args+=(--cmake-build-type ${{ inputs.CMAKE_BUILD_TYPE }}) | ||
script_args+=(${{ inputs.BUILD_AND_TEST_CLI_ARGS }}) | ||
|
||
# SPLIT_DOCKER_REPOSITORY=(${DOCKER_REPOSITORY//// }) | ||
# CONTAINER_NAME=geosx_build_${SPLIT_DOCKER_REPOSITORY[1]}_${GITHUB_SHA:0:7} | ||
# if [ "$(docker ps -aq -f name=${CONTAINER_NAME})" ]; then | ||
# docker rm -f ${CONTAINER_NAME} | ||
# fi | ||
# docker_args+=(--name ${CONTAINER_NAME}) | ||
rrsettgast marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
if ${{ inputs.CODE_COVERAGE }} == 'true'; then | ||
script_args+=(--code-coverage) | ||
fi | ||
|
@@ -142,6 +164,7 @@ jobs: | |
set +e | ||
docker run \ | ||
${docker_args[@]} \ | ||
-h=`hostname` \ | ||
${{ inputs.DOCKER_REPOSITORY }}:${{ inputs.DOCKER_IMAGE_TAG }} \ | ||
${GITHUB_WORKSPACE_MOUNT_POINT}/scripts/ci_build_and_test_in_container.sh \ | ||
${script_args[@]} | ||
|
@@ -156,6 +179,12 @@ jobs: | |
echo "Download the bundle at https://storage.googleapis.com/${{ inputs.GCP_BUCKET }}/${DATA_BASENAME}" | ||
fi | ||
fi | ||
|
||
# Remove the container and the workspace to avoid any conflict with the next run. | ||
echo github.workspace = ${{ github.workspace }} | ||
#rm -rf ${{ github.workspace }}/* | ||
#docker rm -f ${CONTAINER_NAME} | ||
Comment on lines
+187
to
+188
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So do you still need to clean up? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then can you add a small comment about it? |
||
|
||
exit ${EXIT_STATUS} | ||
|
||
- name: Upload coverage to Codecov | ||
|
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -143,6 +143,26 @@ EOT | |
# The path to the `sccache` executable is available through the SCCACHE environment variable. | ||
SCCACHE_CMAKE_ARGS="-DCMAKE_CXX_COMPILER_LAUNCHER=${SCCACHE} -DCMAKE_CUDA_COMPILER_LAUNCHER=${SCCACHE}" | ||
|
||
if [[ ${HOSTNAME} == 'streak.llnl.gov' ]]; then | ||
DOCKER_CERTS_DIR=/usr/local/share/ca-certificates | ||
for file in "${GEOS_SRC_DIR}"/certificates/*.crt.pem; do | ||
if [ -f "$file" ]; then | ||
filename=$(basename -- "$file") | ||
filename_no_ext="${filename%.*}" | ||
new_filename="${DOCKER_CERTS_DIR}/${filename_no_ext}.crt" | ||
cp "$file" "$new_filename" | ||
echo "Copied $filename to $new_filename" | ||
fi | ||
done | ||
update-ca-certificates | ||
# gcloud config set core/custom_ca_certs_file cert.pem' | ||
|
||
NPROC=8 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is hard coded right now for streak. |
||
else | ||
NPROC=$(nproc) | ||
fi | ||
echo "Using ${NPROC} cores." | ||
|
||
echo "sccache initial state" | ||
${SCCACHE} --show-stats | ||
fi | ||
|
@@ -154,7 +174,7 @@ if [[ "${RUN_INTEGRATED_TESTS}" = true ]]; then | |
or_die apt-get install -y virtualenv python3-dev python-is-python3 | ||
ATS_PYTHON_HOME=/tmp/run_integrated_tests_virtualenv | ||
or_die virtualenv ${ATS_PYTHON_HOME} | ||
ATS_CMAKE_ARGS="-DATS_ARGUMENTS=\"--machine openmpi --ats openmpi_mpirun=/usr/bin/mpirun --ats openmpi_args=--allow-run-as-root --ats openmpi_procspernode=2 --ats openmpi_maxprocs=2\" -DPython3_ROOT_DIR=${ATS_PYTHON_HOME}" | ||
ATS_CMAKE_ARGS="-DATS_ARGUMENTS=\"--machine openmpi --ats openmpi_mpirun=/usr/bin/mpirun --ats openmpi_args=--allow-run-as-root --ats openmpi_procspernode=${NPROC} --ats openmpi_maxprocs=${NPROC}\" -DPython3_ROOT_DIR=${ATS_PYTHON_HOME}" | ||
fi | ||
|
||
|
||
|
@@ -208,9 +228,9 @@ fi | |
|
||
# Performing the requested build. | ||
if [[ "${BUILD_EXE_ONLY}" = true ]]; then | ||
or_die ninja -j $(nproc) geosx | ||
or_die ninja -j $NPROC geosx | ||
else | ||
or_die ninja -j $(nproc) | ||
or_die ninja -j $NPROC | ||
or_die ninja install | ||
|
||
if [[ ! -z "${DATA_BASENAME_WE}" ]]; then | ||
|
@@ -232,7 +252,11 @@ fi | |
|
||
# Run the unit tests (excluding previously ran checks). | ||
if [[ "${RUN_UNIT_TESTS}" = true ]]; then | ||
or_die ctest --output-on-failure -E "testUncrustifyCheck|testDoxygenCheck" | ||
if [[ ${HOSTNAME} == 'streak.llnl.gov' ]]; then | ||
or_die ctest --output-on-failure -E "testUncrustifyCheck|testDoxygenCheck|testLifoStorage|testExternalSolvers" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is to exclude the two failing unit tests on GPU systems so we can merge this and follow up with fix PR's. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you create an issue pointing at this line though? If someone other than you fixes the tests I am pretty sure they'll forget to remove this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would, but I actually don't think we should be excluding unit tests once they are fixed. This is just a hack to get us to the next step (passing tests), after witch this goes away. I actually do want to make it as hard as possible to exclude specific tests 😁 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes sense. I kind of prefer having an easy way to do thing with a Cerberus at the code review, but it's OK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well....that is a good enough reason. |
||
else | ||
or_die ctest --output-on-failure -E "testUncrustifyCheck|testDoxygenCheck" | ||
fi | ||
fi | ||
|
||
if [[ "${RUN_INTEGRATED_TESTS}" = true ]]; then | ||
|
@@ -256,8 +280,14 @@ if [[ "${RUN_INTEGRATED_TESTS}" = true ]]; then | |
or_die tar cfM ${DATA_EXCHANGE_DIR}/${DATA_BASENAME_WE}.tar --directory ${GEOS_SRC_DIR} --transform "s/^integratedTests/${DATA_BASENAME_WE}\/repo/" integratedTests | ||
or_die tar rfM ${DATA_EXCHANGE_DIR}/${DATA_BASENAME_WE}.tar --directory ${GEOSX_BUILD_DIR} --transform "s/^integratedTests/${DATA_BASENAME_WE}\/logs/" integratedTests | ||
or_die gzip ${DATA_EXCHANGE_DIR}/${DATA_BASENAME_WE}.tar | ||
|
||
# want to clean the integrated tests folder to avoid polluting the next build. | ||
or_die integratedTests/geos_ats.sh -a clean | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With @cssherman 's patch to build the baselines out of source, cleaning the integrated tests will surely not be mandatory anymore, since they disappear with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this was before the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I think @cssherman 's PR has not been merged already. Am I wrong? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An option can be to mount the workspace read-only in the container, then copy it to a temporary and work in it (waiting for an out-of-source integrated tests which should solve the last item which touches the src dir). Eventually, we'll be able to skip the workspace copy, but not right now. |
||
fi | ||
|
||
# Cleaning the build directory. | ||
or_die ninja clean | ||
Comment on lines
+288
to
+289
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is most likely superfluous since the build dir is in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you are right. |
||
|
||
# If we're here, either everything went OK or we have to deal with the integrated tests manually. | ||
if [[ ! -z "${INTEGRATED_TEST_EXIT_STATUS+x}" ]]; then | ||
echo "Exiting the build process with exit status ${INTEGRATED_TEST_EXIT_STATUS} from the integrated tests." | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -98,7 +98,7 @@ class SolidBaseUpdates | |
arrayView1d< real64 const > const m_thermalExpansionCoefficient; | ||
|
||
/// Flag to disable inelasticity | ||
const bool & m_disableInelasticity; | ||
const bool m_disableInelasticity; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was a bug on heterogeneous memory systems. @klevzoff This was the problem with the constitutive unit tests from the remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there any way to automatically check for this kind of bugs? Any way to at least get a warning whenever a reference is captured by value in a raja |
||
|
||
/** | ||
* @brief Get bulkModulus | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -192,6 +192,8 @@ void compute2DLaplaceOperator( MPI_Comm comm, | |
|
||
// Construct the 2D Laplace matrix | ||
laplace2D.create( matrix.toViewConst(), matrix.numRows(), comm ); | ||
|
||
// laplace2D.write( "matrix.txt", LAIOutputFormat::NATIVE_ASCII); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. keep this because it will be needed when we fix this test on GPU. |
||
} | ||
|
||
/** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the
DOCKER_RUN_ARGS
input to add-v /etc/pki/ca-trust/source/anchors/.../bundle.crt:/etc/pki/ca-trust/.../llnl.bundle.crt:ro
.That way the certificate would be at the proper location in read-only, with no copy required from within the container.
Then inside of the docker container simply update the certificates list (we can do it whatever the case : it's fast and could do not harm?). Or if we really want it, we can add an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We certainly can...and probably should. Would you mind if we did this in a follow up PR. We are getting another workstation to run integratedTests on the CPU, and we will require a generic treatment at that time. I would rather just get this merged as is, and in separate PR's:
we can do all of these steps independent of each other IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm good with it. We need to be sure we actually have time to revisit this.