Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MEDIUM: watcher: fix race condition & plumbing stop for test #42

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amelhusic
Copy link
Collaborator

No description provided.

@amelhusic
Copy link
Collaborator Author

this fix is related to: #10

@aiharos aiharos requested a review from ShimmerGlass April 28, 2020 10:00
@ShimmerGlass
Copy link
Collaborator

This will not stop the goroutines from running once the Watcher is stopped.
One way to do that is to add a check in the infinite loops to stop them when the Watcher is stopped:

for {
  select {
    case <-w.shutdownCh:
      return
    default:
  }
  
  // work
}

Additionally, I'd prefer exposing a Stop() method rather than the chan directly.

@amelhusic amelhusic force-pushed the feat/watcher-data-race branch 2 times, most recently from 7648703 to 93d24e5 Compare May 14, 2020 14:57
@amelhusic
Copy link
Collaborator Author

I applied your recommendations.

@amelhusic amelhusic force-pushed the feat/watcher-data-race branch 5 times, most recently from a9e7aca to 1093717 Compare May 15, 2020 21:13
func (w *Watcher) notifyShutdownCh() {
select {
case <-w.shutdownCh:
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return here will only return from this function, not the caller, this will not stop the for loops.
You need to put the select directly in the loop from the caller to have the effect you want.
Alternatively, if the goal here is to avoid repeating this select statement over and over, it could be rewritten as :

func (w *Watcher) isStopped() bool {
	select {
	case <-w.shutdownCh:
		return true
        default:
                return false
        }
}

And is the for loop :

for {
    if w.isStopped() {
        return
    }

    // work
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return here will only return from this function, not the caller, this will not stop the for loops.

You are correct about this and I updated this part.

However, placing isStopped func at the top of for loop requires using a time.Sleep in order to execute these routines, just above this line: 4a88643#diff-df428c63426402bd3667b15209ed395fR73 Otherwise, test is hanging forever.

On the other hand, placing isStopped func at the bottom of for loops works and code in these goroutines is executed at least once before exit. Also, there is no need for time.Sleep.
What do you think about this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are errors from consul w.isStopped() will never be called because there are continue statements on error. Having them at the top ensure the check is done.
The time.Sleep() issue can be fixed by defering watcher.Stop()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ShimmerGlass

Sorry for delay. I think w.isStopped() should be at the end of each for loop, because watcher.Run() internally spawns these goroutines and it uses sync.WaitGroup: https://github.com/haproxytech/haproxy-consul-connect/blob/master/consul/watcher.go#L108. In order for watcher.Run() to complete, we need to ensure that each of these functions have called w.ready.Done(). So, watcher.Stop() can only work if watcher.Run() is completed.

Placing w.isStopped() at the top of for loops will immediately return from there. So, w.ready.Done() will never execute and test will never pass.

If error occurs there is a timeout, errorWaitTime configured for 5s, so before test reach its timeout(30s) there would be 6 goroutines per functions, which should not be a problem.

@amelhusic amelhusic force-pushed the feat/watcher-data-race branch 3 times, most recently from 4a88643 to 5b5e260 Compare May 26, 2020 09:11
@amelhusic amelhusic force-pushed the feat/watcher-data-race branch from 5b5e260 to 6fe7c61 Compare July 8, 2020 23:20
@aiharos aiharos requested a review from ShimmerGlass August 24, 2020 13:35
@aiharos
Copy link
Collaborator

aiharos commented Aug 24, 2020

Please recheck and give a thumbs up for mergeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants