Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add flaky retry in weekly CI #662

Merged
merged 27 commits into from
May 23, 2024

Conversation

RomanBredehoft
Copy link
Collaborator

@RomanBredehoft RomanBredehoft commented Apr 30, 2024

Clean the weekly CI :

  • new workflow : scheduled + on demand , with dedicated status history
  • re-run flaky issues like in regular PRs (+ report them on slack notifications for weeklys)
  • remove slack reports for regular CIs (keep for weekly + push on main/release branches)
  • better slack report message (ex: https://zama-ai.slack.com/archives/C02RX6CF3FV/p1715090858656719)

refs https://github.com/zama-ai/concrete-ml-internal/issues/4030
refs https://github.com/zama-ai/concrete-ml-internal/issues/4428
closes https://github.com/zama-ai/concrete-ml-internal/issues/4390

@cla-bot cla-bot bot added the cla-signed label Apr 30, 2024
@RomanBredehoft RomanBredehoft force-pushed the chore/add_flaky_retry_in_weekly_ci_4390 branch from fbee1a7 to e25ceff Compare April 30, 2024 15:35
@@ -936,7 +952,7 @@ jobs:
needs: [build-linux]
runs-on: ubuntu-20.04
timeout-minutes: 2
if: ${{ always() }}
if: success() || failure()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid running this if all previous jobs are skipped

- name: PyTest (no flaky) with PyPI local wheel of Concrete ML (release)
# Run Pytest on all of our tests (except flaky ones) using PyPI's local wheel in the weekly
# or during the release process
- name: PyTest (no flaky) with PyPI local wheel of Concrete ML (weekly, release)
Copy link
Collaborator Author

@RomanBredehoft RomanBredehoft Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid flaky tests in weekly tests with pypi as well (felt redundant)

@RomanBredehoft RomanBredehoft force-pushed the chore/add_flaky_retry_in_weekly_ci_4390 branch 13 times, most recently from 2460879 to e5be845 Compare May 3, 2024 15:08
Copy link

github-actions bot commented May 3, 2024

⚠️ Known flaky tests have been rerun ⚠️

One or several tests initially failed but were identified as known flaky. tests. Therefore, they have been rerun and passed. See below for more details.

Failed tests details

Known flaky tests that initially failed:

  • tests/torch/test_compile_torch.py::test_compile_torch_or_onnx_conv_networks[True-True-CNN_grouped-relu]

@RomanBredehoft RomanBredehoft force-pushed the chore/add_flaky_retry_in_weekly_ci_4390 branch from 18601fb to f234521 Compare May 6, 2024 11:41
@@ -60,12 +58,6 @@ on:
type: boolean
required: false
default: false

schedule:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to the new weekly_tests.yaml

@RomanBredehoft RomanBredehoft force-pushed the chore/add_flaky_retry_in_weekly_ci_4390 branch from f4ec792 to 6e86840 Compare May 7, 2024 14:26
IS_WEEKLY: ${{ github.event_name == 'schedule' || ((github.event_name == 'workflow_dispatch') && (inputs.event_name == 'weekly')) }}
# Run the weekly CI if it has been triggered manually by the weekly workflow, meaning
# 'inputs.event_name' is set to "weekly"
IS_WEEKLY: ${{ inputs.event_name == 'weekly'}}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now weekly CIs are only triggerd by the new weekly_tests.yaml CI

@@ -147,7 +144,7 @@ jobs:
# Manage instance type
INSTANCE_TYPE="c5.4xlarge"
if [[ "${BUILD_TYPE}" == "weekly" ]]; then
INSTANCE_TYPE="m6i.metal"
INSTANCE_TYPE="m6i.16xlarge"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for the release, m6i.metal started to be less available then before

)
&& steps.conformance.outcome == 'success'
&& !cancelled()
shell: bash +e {0}
run: |
make pytest_and_report
if [[ "${{ env.IS_WEEKLY }}" == "true" ]]; then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run weekly tests in weekly CIs, but keep the same flaky re-run system

@@ -951,20 +961,8 @@ jobs:
exit 1
fi

- name: Slack Notification
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep a single slack notif at the end of the whole CI

- name: PyTest Source Code
run: |
make pytest_macOS_for_GitHub

decide-slack-report:
Copy link
Collaborator Author

@RomanBredehoft RomanBredehoft May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only report on slack :

  • weekly CIs
  • push to main/release branch CIs

we can discuss about it but I feel it's a bit useless to report all regular CIs on slack as it makes things very verbose and not very easy to follow

for linux_python_version in ${LINUX_PYTHON_VERSIONS}; do
file_name="failed_tests_slack_list_${linux_python_version}.txt"

if [ -f ${file_name} ]; then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print flaky tests that got re-run if they were some in the slack notification

@@ -0,0 +1,22 @@
name: Weekly Tests
Copy link
Collaborator Author

@RomanBredehoft RomanBredehoft May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new workflow that only handles weekly CIs, which run:

  • each sunday night
  • on demand using GH's interface

we will be able to find all weekly CI's status history much more easily (weeklys won't be mixed up with regular CIs)

@@ -724,7 +735,8 @@ jobs:
poetry run python ./script/actions_utils/pytest_failed_test_report.py \
--pytest-input-report "pytest_report.json" \
--failed-tests-report "failed_tests_report.json" \
--failed-tests-comment "failed_tests_comment.txt"
--failed-tests-comment "failed_tests_comment_${{ matrix.python_version }}.txt"
--failed-tests-list "failed_tests_slack_list_${{ matrix.python_version }}.txt"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write the list of failed tests that will later be used for reporting in slack flaky tests that got re-run within a weekly CI

@RomanBredehoft RomanBredehoft marked this pull request as ready for review May 7, 2024 14:36
@RomanBredehoft RomanBredehoft requested a review from a team as a code owner May 7, 2024 14:36
Copy link

github-actions bot commented May 7, 2024

Coverage passed ✅

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL    7582      0   100%

59 files skipped due to complete coverage.

@RomanBredehoft RomanBredehoft marked this pull request as draft May 7, 2024 16:06
@RomanBredehoft RomanBredehoft marked this pull request as ready for review May 23, 2024 09:49
@RomanBredehoft RomanBredehoft merged commit bf47498 into main May 23, 2024
12 checks passed
@RomanBredehoft RomanBredehoft deleted the chore/add_flaky_retry_in_weekly_ci_4390 branch May 23, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants