Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat]: fix status field when monitors are down #36623

Closed

Conversation

vigneshshanmugam
Copy link
Member

@vigneshshanmugam vigneshshanmugam commented Sep 19, 2023

  • Fixes a bug with the summarizer [Heartbeat] Fix summarizer #36519 that did not produce the correct monitor.status: down when the monitor is retried with the second attempt.
  • Previously the status was set to empty monitor.status: "" field which is incorrect. The PR fixes the issue, by correctly setting the Up/Down and also maintain the previous status when retry is attempted.

Test monitor

heartbeat.monitors:
- type: browser
  id: test-hb
  enabled: true
  name: Test HB dev
  schedule: '@every 1m'
  screenshots: "off"
  max_attempts: 2
  source:
    inline:
      script: |-
        step("load homepage", async ) => {

        });

@vigneshshanmugam vigneshshanmugam requested a review from a team as a code owner September 19, 2023 22:07
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 19, 2023
@vigneshshanmugam vigneshshanmugam added bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.11.0 and removed needs_team Indicates that the issue/PR needs a Team:* label labels Sep 19, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@mergify
Copy link
Contributor

mergify bot commented Sep 19, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @vigneshshanmugam? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-19T22:07:41.299+0000

  • Duration: 32 min 54 sec

Test stats 🧪

Test Results
Failed 20
Passed 2454
Skipped 0
Total 2474

Test errors 20

Expand to view the tests failures

> Show only the first 10 test failures

Build&Test / heartbeat-goIntegTest / TestSummarizer/start_down,_transition_to_up – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer/start_down,_transition_to_up
    === PAUSE TestSummarizer/start_down,_transition_to_up
    === CONT  TestSummarizer/start_down,_transition_to_up
        summarizer_test.go:215: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-36623-1-87f1d5d7-e8ce-41e5-bdd1-5ccb19677d69/src/github.com/elastic/beats/heartbeat/monitors/wrappers/summarizer/summarizer_test.go:215
            	Error:      	Not equal: 
            	            	expected: "du"
            	            	actual  : "dd"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	-du
            	            	+dd
            	Test:       	TestSummarizer/start_down,_transition_to_up
    --- FAIL: TestSummarizer/start_down,_transition_to_up (0.00s)
     
    

Build&Test / heartbeat-goIntegTest / TestSummarizer/start_up,_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer/start_up,_transient_down,_recover
    === PAUSE TestSummarizer/start_up,_transient_down,_recover
    === CONT  TestSummarizer/start_up,_transient_down,_recover
        summarizer_test.go:215: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-36623-1-87f1d5d7-e8ce-41e5-bdd1-5ccb19677d69/src/github.com/elastic/beats/heartbeat/monitors/wrappers/summarizer/summarizer_test.go:215
            	Error:      	Not equal: 
            	            	expected: "uuuuuuuu"
            	            	actual  : "uuuuduuu"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	-uuuuuuuu
            	            	+uuuuduuu
            	Test:       	TestSummarizer/start_up,_transient_down,_recover
    --- FAIL: TestSummarizer/start_up,_transient_down,_recover (0.00s)
     
    

Build&Test / heartbeat-goIntegTest / TestSummarizer/start_up,_multiple_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer/start_up,_multiple_transient_down,_recover
    === PAUSE TestSummarizer/start_up,_multiple_transient_down,_recover
    === CONT  TestSummarizer/start_up,_multiple_transient_down,_recover
        summarizer_test.go:215: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-36623-1-87f1d5d7-e8ce-41e5-bdd1-5ccb19677d69/src/github.com/elastic/beats/heartbeat/monitors/wrappers/summarizer/summarizer_test.go:215
            	Error:      	Not equal: 
            	            	expected: "uuuuuuuuu"
            	            	actual  : "uuuuddddd"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	-uuuuuuuuu
            	            	+uuuuddddd
            	Test:       	TestSummarizer/start_up,_multiple_transient_down,_recover
    --- FAIL: TestSummarizer/start_up,_multiple_transient_down,_recover (0.01s)
     
    

Build&Test / heartbeat-goIntegTest / TestSummarizer – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer
    === PAUSE TestSummarizer
    === CONT  TestSummarizer
    --- FAIL: TestSummarizer (0.00s)
     
    

Build&Test / heartbeat-rhel-9-rhel-9 / TestSummarizer/start_down,_transition_to_up – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer/start_down,_transition_to_up
    === PAUSE TestSummarizer/start_down,_transition_to_up
    === CONT  TestSummarizer/start_down,_transition_to_up
        summarizer_test.go:215: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-36623-1-0fdbed63-4a42-4e60-8f95-1f7bace59546/src/github.com/elastic/beats/heartbeat/monitors/wrappers/summarizer/summarizer_test.go:215
            	Error:      	Not equal: 
            	            	expected: "du"
            	            	actual  : "dd"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	-du
            	            	+dd
            	Test:       	TestSummarizer/start_down,_transition_to_up
    --- FAIL: TestSummarizer/start_down,_transition_to_up (0.00s)
     
    

Build&Test / heartbeat-rhel-9-rhel-9 / TestSummarizer/start_up,_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer/start_up,_transient_down,_recover
    === PAUSE TestSummarizer/start_up,_transient_down,_recover
    === CONT  TestSummarizer/start_up,_transient_down,_recover
        summarizer_test.go:215: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-36623-1-0fdbed63-4a42-4e60-8f95-1f7bace59546/src/github.com/elastic/beats/heartbeat/monitors/wrappers/summarizer/summarizer_test.go:215
            	Error:      	Not equal: 
            	            	expected: "uuuuuuuu"
            	            	actual  : "uuuuduuu"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	-uuuuuuuu
            	            	+uuuuduuu
            	Test:       	TestSummarizer/start_up,_transient_down,_recover
    --- FAIL: TestSummarizer/start_up,_transient_down,_recover (0.00s)
     
    

Build&Test / heartbeat-rhel-9-rhel-9 / TestSummarizer/start_up,_multiple_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer/start_up,_multiple_transient_down,_recover
    === PAUSE TestSummarizer/start_up,_multiple_transient_down,_recover
    === CONT  TestSummarizer/start_up,_multiple_transient_down,_recover
        summarizer_test.go:215: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-36623-1-0fdbed63-4a42-4e60-8f95-1f7bace59546/src/github.com/elastic/beats/heartbeat/monitors/wrappers/summarizer/summarizer_test.go:215
            	Error:      	Not equal: 
            	            	expected: "uuuuuuuuu"
            	            	actual  : "uuuuddddd"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	-uuuuuuuuu
            	            	+uuuuddddd
            	Test:       	TestSummarizer/start_up,_multiple_transient_down,_recover
    --- FAIL: TestSummarizer/start_up,_multiple_transient_down,_recover (0.01s)
     
    

Build&Test / heartbeat-rhel-9-rhel-9 / TestSummarizer – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer
    === PAUSE TestSummarizer
    === CONT  TestSummarizer
    --- FAIL: TestSummarizer (0.00s)
     
    

Build&Test / heartbeat-unitTest / TestSummarizer/start_up,_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer/start_up,_transient_down,_recover
    === PAUSE TestSummarizer/start_up,_transient_down,_recover
    === CONT  TestSummarizer/start_up,_transient_down,_recover
        summarizer_test.go:215: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-36623-1-48a6fd16-efb9-4202-b9e6-03315e516251/src/github.com/elastic/beats/heartbeat/monitors/wrappers/summarizer/summarizer_test.go:215
            	Error:      	Not equal: 
            	            	expected: "uuuuuuuu"
            	            	actual  : "uuuuduuu"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	-uuuuuuuu
            	            	+uuuuduuu
            	Test:       	TestSummarizer/start_up,_transient_down,_recover
    --- FAIL: TestSummarizer/start_up,_transient_down,_recover (0.00s)
     
    

Build&Test / heartbeat-unitTest / TestSummarizer/start_down,_transition_to_up – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

     === RUN   TestSummarizer/start_down,_transition_to_up
    === PAUSE TestSummarizer/start_down,_transition_to_up
    === CONT  TestSummarizer/start_down,_transition_to_up
        summarizer_test.go:215: 
            	Error Trace:	/var/lib/jenkins/workspace/PR-36623-1-48a6fd16-efb9-4202-b9e6-03315e516251/src/github.com/elastic/beats/heartbeat/monitors/wrappers/summarizer/summarizer_test.go:215
            	Error:      	Not equal: 
            	            	expected: "du"
            	            	actual  : "dd"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	-du
            	            	+dd
            	Test:       	TestSummarizer/start_down,_transition_to_up
    --- FAIL: TestSummarizer/start_down,_transition_to_up (0.00s)
     
    

Steps errors 16

Expand to view the steps failures

Show only the first 10 steps failures

heartbeat-rhel-9-rhel-9 - mage build unitTest
  • Took 3 min 1 sec . View more details here
  • Description: mage build unitTest
heartbeat-rhel-9-rhel-9 - mage build unitTest
  • Took 0 min 28 sec . View more details here
  • Description: mage build unitTest
heartbeat-rhel-9-rhel-9 - mage build unitTest
  • Took 0 min 28 sec . View more details here
  • Description: mage build unitTest
heartbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 5 min 14 sec . View more details here
  • Description: mage build unitTest
heartbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 1 min 9 sec . View more details here
  • Description: mage build unitTest
heartbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 1 min 43 sec . View more details here
  • Description: mage build unitTest
heartbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 4 min 28 sec . View more details here
  • Description: mage build unitTest
heartbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 1 min 42 sec . View more details here
  • Description: mage build unitTest
heartbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 1 min 42 sec . View more details here
  • Description: mage build unitTest
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: Error 'hudson.AbortException: script returned exit code 1'

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Genuine test errors 20

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

  • Name: Build&Test / heartbeat-goIntegTest / TestSummarizer/start_down,_transition_to_up – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-goIntegTest / TestSummarizer/start_up,_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-goIntegTest / TestSummarizer/start_up,_multiple_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-goIntegTest / TestSummarizer – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-rhel-9-rhel-9 / TestSummarizer/start_down,_transition_to_up – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-rhel-9-rhel-9 / TestSummarizer/start_up,_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-rhel-9-rhel-9 / TestSummarizer/start_up,_multiple_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-rhel-9-rhel-9 / TestSummarizer – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-unitTest / TestSummarizer/start_up,_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-unitTest / TestSummarizer/start_down,_transition_to_up – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-unitTest / TestSummarizer/start_up,_multiple_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-unitTest / TestSummarizer – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-windows-2022-windows-2022 / TestSummarizer/start_up,_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-windows-2022-windows-2022 / TestSummarizer/start_down,_transition_to_up – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-windows-2022-windows-2022 / TestSummarizer/start_up,_multiple_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-windows-2022-windows-2022 / TestSummarizer – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-windows-2016-windows-2016 / TestSummarizer/start_up,_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-windows-2016-windows-2016 / TestSummarizer/start_down,_transition_to_up – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-windows-2016-windows-2016 / TestSummarizer/start_up,_multiple_transient_down,_recover – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer
  • Name: Build&Test / heartbeat-windows-2016-windows-2016 / TestSummarizer – github.com/elastic/beats/v7/heartbeat/monitors/wrappers/summarizer

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only tested functionality and it seems to be working now !!

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs tests, and I'm not sure this code works

newJs := *NewJobSummary(js.Attempt+1, js.MaxAttempts, js.RetryGroup)
newJs.Up = js.Up
newJs.Down = js.Down
newJs.Status = js.Status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why a new job would inherit the status of the old job, same for the up and down fields, those should be set during the execution of the new job. If this fixes the bug I don't understand why. Also, we need tests.

Does this code work if the first attempt is down, but the second is up? As I read it the answer would be no, but I may misunderstand

@andrewvc
Copy link
Contributor

I think we should close this in favor of #36704

@vigneshshanmugam vigneshshanmugam deleted the fix-status-summarizer branch September 29, 2023 03:48
andrewvc added a commit that referenced this pull request Sep 29, 2023
[Heartbeat] Fix missing monitor.status value in initial attempt where max_attempts > 2. Introduced in #36623 adding tests to the scenario runner as well.

Original cause was this PR #36519 that did not produce the correct monitor.status: down when the monitor is retried with the second attempt.
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
)

[Heartbeat] Fix missing monitor.status value in initial attempt where max_attempts > 2. Introduced in elastic#36623 adding tests to the scenario runner as well.

Original cause was this PR elastic#36519 that did not produce the correct monitor.status: down when the monitor is retried with the second attempt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants