Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go Hack] feature addition and refractor #18

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

yadunandan004
Copy link

@yadunandan004 yadunandan004 commented May 15, 2022

This PR includes the following main changes:

##===================================================================##

Retry mechanism:

This is used to attempt retries in case of failure.

lbcluster/retry_module.go

type Retry struct {
signal chan int // provides the retry signal
done chan bool // marks the end of retry
tick chan bool
currentCount int
maxCount int // maximum count of retries permitted
retryStarted bool
maxDuration time.Duration // max duration for which retries are allowed
retryDuration time.Duration
prevRetryDuration time.Duration
logger logger.Logger
}

  • Max duration can be set via SetMaxDuration(maxDuration time.Duration) error
  • Max retry count can be set via SetMaxCount(maxCount int) error
  • The retry back off time keeps incrementing based on fibonacci sequence.
  • The maximum number of retries that are allowed will be either till the max count is reached or the max duration is reached. The lesser of the two is considered.
  • Any function that needs to be part of the retry mechanism (mostly network IO related), can be set by passing to the retry executor Execute(executor func() error) error. Implementation is present in lbcluster/lbcluster_dns.go:78

##===================================================================##

Logging Refractor:

This includes general refractor of logging, and also adds snapshot support to logging as well.

logger/lbcluster_log.go

  • Contains a Factory method to create logger now. NewLoggerFactory(logFilePath string) (Logger, error). Any log based errors are handled during factory initialization

type Logger interface {
EnableDebugMode()
EnableWriteToSTd()
StartSnapshot(d time.Duration)
GetLogFilePath() string
Info(s string)
Warning(s string)
Debug(s string)
Error(s string)
}

  • The above interface used as the logger type everywhere. The methods can be used instead of mutating the variables directly.
  • Log based Snapshots : logs can now be made into snapshots by providing a cycle time. Once the cycle time expires a new snapshot is created. StartSnapshot(d time.Duration)
  • log snapshots will be created in the following format:

sample.0.log, sample.1.log

##===================================================================##

Metric Collection:

Added feature to collect metrics for DNS updates. This can be used for telemetry purposes.

Default Port: 8081

  • Contains GET API \metric to fetch all the metrics related to the server for that day.
  • Server can be started by NewMetricServer(metricDirectoryPath string) error. Implemented in lbd.go:189( but disabled by default)
  • New record can be written by WriteRecord(property Property) error. implemented in lbcluster/lbcluster_dns.go:86

type Property struct {
RoundTripStartTime time.Time json:"start_time" csv:"start_time" rw:"r"
RoundTripEndTime time.Time json:"end_time" csv:"end_time" rw:"r"
RoundTripDuration time.Duration json:"round_trip_duration"
}

above are the current supported metrics. This functionality can be extended to add more.

##===================================================================##

Concurrency Addition

  • Concurrency has been added to two following methods.
    • EvaluateHosts(hostsToCheck map[string]lbhost.Host) in lbcluster/lbcluster.go:294
    • SNMPDiscovery() in lbhost/lbhost.go:93

##===================================================================##

General Refractors

Refractors are done with following few things in mind.

  • Avoid direct use of variables to change state or read state. Getter and setter methods used instead.
  • Avoid passing too many params or redundant params. Params such as logger and others which are common are added to the underlying struct instead.
  • Remove redundant code

Appropriate test cases have been added to all the above mentioned changes

@@ -0,0 +1,7 @@
package model

type CluserConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that this is supposed to be ClusterConfig? Note that it will have to be changed in the other 14 places

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will change this.


func (r *Retry) run() {
start := make(chan bool)
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks for this contribution! I appreciate a lot that you took your time to look into the code, and think of ways of improving it.

I'm concerned about the retry approach. If I get it right, the retry is done on a different goroutine. Imagine that this is going over 200 aliases, and, for some reason, there are issues with the DNS, and they start failing. All those retries will start piling up. Taking into account that the main loop retries every 10 seconds, this might turn into a DOS, isn't it?

I have the feeling that the current approach of failing once, and hoping that the next iteration will fix it might be less risky.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the scenario does makes for the retry call to be removed. Should I remove the whole retry module or just the call (reserve the module for any future use maybe) ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yadunandan004 cannot we try to use a CircuitBreaker pattern here instead of plain retry? It works like this:

Once you retry a previous message lets say after 10 seconds, if it fails, the next consequent retry will have a more cooldown time till the next retry, for example, lets say retryTime = retryTime * 2 or some other function that is increasing. Once a retry is successful, then we can either set retryTime = DEFAULT_RETRY_TIME or decrease it my a multiplicative factor.

Reference: https://github.com/cloud-native-go/examples/blob/main/ch04/circuitbreaker.go

What do you think, @psaiz ?

Copy link
Author

@yadunandan004 yadunandan004 Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes I do have computeNextRetryTime (Line:126) method to compute for next retry time by using fibonacci algorithm. Would that suffice, or should I use another algorithm for that purpose?

Copy link

@mehrankamal mehrankamal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some feedback or improvements for the retry_module. Take them with a pinch of salt.


func (r *Retry) run() {
start := make(chan bool)
go func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yadunandan004 cannot we try to use a CircuitBreaker pattern here instead of plain retry? It works like this:

Once you retry a previous message lets say after 10 seconds, if it fails, the next consequent retry will have a more cooldown time till the next retry, for example, lets say retryTime = retryTime * 2 or some other function that is increasing. Once a retry is successful, then we can either set retryTime = DEFAULT_RETRY_TIME or decrease it my a multiplicative factor.

Reference: https://github.com/cloud-native-go/examples/blob/main/ch04/circuitbreaker.go

What do you think, @psaiz ?

func (r *Retry) run() {
start := make(chan bool)
go func() {
ticker := time.NewTicker(1 * time.Minute)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ticker duration should be configurable from the Retry type, I think. Because, we may want to have different retry duration based on services?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That can be configured while instantiating via the constructor itself (Line:24).

Comment on lines +72 to +75
if r.currentCount == r.maxCount {
r.done <- true
return
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this semantically correct to have return true in r.done channel even in case of r.currentCount == r.maxCount. Also, should not it return some error in case of all retries have failed?

Copy link
Author

@yadunandan004 yadunandan004 Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module never returns the true output to the caller. That is just to terminate the looping. The module does return the error once max retries have reached (Line: 122).

@yadunandan004
Copy link
Author

I have addressed all comments, hope that is sufficient. Please let me know if there are any more gaps. If it looks good however, could the branch be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants