Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix-16978][Master] Fix AbstractDelayEvent compare method is incorrect #16980

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

reele
Copy link
Contributor

@reele reele commented Jan 23, 2025

Purpose of the pull request

Fix AbstractDelayEventBus block events issue fix #16978.

Brief change log

  1. Fix AbstractDelayEvent compare method is incorrect
  2. Modify TaskFailureStateAction to allow killEventAction when task instance can retry.

Verify this pull request

Stop a workflow with failed task when waiting to retry.

Pull Request Notice

Pull Request Notice

If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md

@SbloodyS SbloodyS added the bug Something isn't working label Jan 23, 2025
@SbloodyS SbloodyS added this to the 3.3.0 milestone Jan 23, 2025
Comment on lines +134 to +143
// for now, killEventAction can be fired on failure with retry
// there is no task executor now, shouldn't call workflowExecutionGraph.isTaskExecutionRunnableActive()
// it's better to call super.killedEventAction() direct
if (taskExecutionRunnable.isTaskInstanceCanRetry()
// && workflowExecutionGraph.isTaskExecutionRunnableActive(taskExecutionRunnable)
) {
super.killedEventAction(workflowExecutionRunnable, taskExecutionRunnable,
TaskKilledLifecycleEvent.of(taskExecutionRunnable));
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this change? or this is only your test code?

Copy link
Contributor Author

@reele reele Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not test code, before this pr, the kill event always triggered after retry event, so the kill or start(retry) action always fire on submitted state

this code use to finish the workflow's Topology graph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got you.
Please use

if (taskExecutionRunnable.isTaskInstanceCanRetry()
                && workflowExecutionGraph.isTaskExecutionRunnableActive(taskExecutionRunnable)) {

to replace

if (taskExecutionRunnable.isTaskInstanceCanRetry()
        // && workflowExecutionGraph.isTaskExecutionRunnableActive(taskExecutionRunnable)
        ) 

You can add a new private method for this, this logic also exist in killedEventAction

private boolean isTaskRetrying(
            final IWorkflowExecutionRunnable workflowExecutionRunnable,
            final ITaskExecutionRunnable taskExecutionRunnable) {
        final IWorkflowExecutionGraph workflowExecutionGraph = taskExecutionRunnable.getWorkflowExecutionGraph();
        return taskExecutionRunnable.isTaskInstanceCanRetry()
                && workflowExecutionGraph.isTaskExecutionRunnableActive(taskExecutionRunnable);
    }

Copy link
Contributor Author

@reele reele Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a new private method for this, this logic also exist in killedEventAction

sure.

but i need test workflowExecutionGraph.isTaskExecutionRunnableActive(taskExecutionRunnable) more for some case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a bit confused about this:
when task is failed and going to waiting retry, this task was not removed from WorkflowExecutionGraph.activeTaskExecutionRunnable, activeTaskExecutionRunnable.add() in isTaskExecutionRunnableActive will always return false, this result prevented the execution of if-block.

or WorkflowExecutionGraph.isTaskExecutionRunnableActive should call activeTaskExecutionRunnable.contains instead of activeTaskExecutionRunnable.add() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is another bug, we should use contains rather than add here. You can fix this in this PR or submib another pr.

Copy link
Member

@ruanwenjun ruanwenjun Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it's better to add a judge method isTaskRetrying (ITaskExecutionRunnable taskExecutionRunnable) at IWorkflowExecutionGraph.

public boolean isTaskExecutionRunnableRetrying(final ITaskExecutionRunnable taskExecutionRunnable) {
        if (!taskExecutionRunnable.isTaskInstanceInitialized()) {
            return false;
        }
        final TaskInstance taskInstance = taskExecutionRunnable.getTaskInstance();
        return taskInstance.getState() == TaskExecutionStatus.FAILURE && taskExecutionRunnable.isTaskInstanceCanRetry() && isTaskExecutionRunnableActive(taskExecutionRunnable);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

@ruanwenjun
Copy link
Member

BTW, we are missing test cases in WorkflowInstanceStopTestCase, it's better to add a case kill a workflow which contains retry task.

@reele
Copy link
Contributor Author

reele commented Jan 23, 2025

BTW, we are missing test cases in WorkflowInstanceStopTestCase, it's better to add a case kill a workflow which contains retry task.

sure, i'll do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug Something isn't working priority:high
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [Master] Fix AbstractDelayEvent compare method is incorrect
3 participants