Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script checks #12

Merged
merged 9 commits into from
Jan 11, 2019
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
- checkout
- attach_workspace:
at: /go/src/github.com/gruntwork-io/health-checker
- run: run-go-tests --circle-ci-2 --path test
- run: run-go-tests --circle-ci-2

build:
<<: *defaults
Expand Down
56 changes: 53 additions & 3 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 38 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# health-checker

A simple HTTP server that will return `200 OK` if the given TCP ports are all successfully accepting connections.
A simple HTTP server that will return `200 OK` if the configured checks are all successful. If any of the checks fail,
it will return `HTTP 504 Gateway Not Found`.

## Motivation

Expand All @@ -14,15 +15,23 @@ a single TCP port, or an HTTP(S) endpoint. As a result, our use case just isn't
We wrote health-checker so that we could run a daemon on the server that reports the true health of the server by
attempting to open a TCP connection to more than one port when it receives an inbound HTTP request on the given listener.

Using the `--script` -option, the `health-checker` can be extended to check many other targets. One concrete exeample is monitoring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/exeample/example/

`ZooKeeper` node status during rolling deployment. Just polling the `ZooKeeper`'s TCP client port doesn't necessarily guarantee
that the node has (re-)joined the cluster. Using the `health-check` with a custom script target, we can
[monitor ZooKeeper](https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_monitoring) using the
[4 letter words](https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands), ensuring we report health back to the
[Load Balancer](https://aws.amazon.com/documentation/elastic-load-balancing/) correctly.

## How It Works

When health-checker is started, it will listen for inbound HTTP requests for any URL on the IP address and port specified
by `--listener`. When it receives a request, it will attempt to open TCP connections to each of the ports specified by
an instance of `--port`. If all TCP connections succeed, it will return `HTTP 200 OK`. If any TCP connection fails, it
will return `HTTP 504 Gateway Not Found`.
an instance of `--port` and/or execute the script target specified by `--script`. If all configured checks - all TCP
connections and zero exit status for the script - succeed, it will return `HTTP 200 OK`. If any of the checks fail,
it will return `HTTP 504 Gateway Not Found`.

Configure your AWS Health Check to only pass the Health Check on `HTTP 200 OK`. Now when an HTTP Health Check request
comes in, all desired TCP ports will be checked.
comes in, all desired TCP ports will be checked and the script target executed.

For stability, we recommend running health-checker under a process supervisor such as [supervisord](http://supervisord.org/)
or [systemd](https://www.freedesktop.org/wiki/Software/systemd/) to automatically restart health-checker in the unlikely
Expand All @@ -46,9 +55,13 @@ health-checker [options]
| `--listener` | The IP address and port on which inbound HTTP connections will be accepted. | `0.0.0.0:5000`
| `--log-level` | Set the log level to LEVEL. Must be one of: `panic`, `fatal`, `error,` `warning`, `info`, or `debug` | `info`
| `--help` | Show the help screen | |
| `--script` | Path to script to run - will pass if it completes within configured timeout with a zero exit status. Specify one or more times. | |
| `--script-timeout` | Timeout, in seconds, to wait for the scripts to exit. Applies to all configured script targets. | `5` |
| `--version` | Show the program's version | |

#### Example
If you execute a shell script, ensure you have a `shebang` line in your script, otherwise the script will fail with an `exec format error`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could our code catch that? Seems like a simple strings.HasPrefix("#!") would do the trick to avoid some confusing errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch here is that we can execute other targets than shell scripts. We would then first have to identify the script as shell script (via the extension or some other way) and only after that check the prefix within. I think it would unnecessarily clutter the code and potentially lead to even more confusing errors 😄


#### Example 1

Run a listener on port 6000 that accepts all inbound HTTP connections for any URL. When the request is received,
attempt to open TCP connections to port 5432 and 3306. If both succeed, return `HTTP 200 OK`. If any fails, return `HTTP
Expand All @@ -58,3 +71,23 @@ attempt to open TCP connections to port 5432 and 3306. If both succeed, return `
health-checker --listener "0.0.0.0:6000" --port 5432 --port 3306
```

#### Example 2

Run a listener on port 6000 that accepts all inbound HTTP connections for any URL. When the request is received,
attempt to open TCP connection to port 5432 and run the script with a 10 second timout. If TCP connection succeeds and script exit code is zero, return `HTTP 200 OK`. If TCP connection fails or non-zero exit code for the script, return `HTTP
504 Gateway Not Found`.

```
health-checker --listener "0.0.0.0:6000" --port 5432 --script /path/to/script.sh --script-timeout 10
```

#### Example 3

Run a listener on port 6000 that accepts all inbound HTTP connections for any URL. When the request is received,
attempt to run the configured scripts. If both return exit code zero, return `HTTP 200 OK`. If either returns non-zero exit code, return `HTTP
504 Gateway Not Found`.

```
health-checker --listener "0.0.0.0:6000" --script "/usr/local/bin/exhibitor-health-check.sh --exhibitor-port 8080" --script "/usr/local/bin/zk-health-check.sh --zk-port 2191"
```

12 changes: 8 additions & 4 deletions commands/cli.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ func CreateCli(version string) *cli.App {
app.HelpName = app.Name
app.Author = "Gruntwork, Inc. <www.gruntwork.io> | https://github.com/gruntwork-io/health-checker"
app.Version = version
app.Usage = "A simple HTTP server that returns a 200 OK when all given TCP ports accept inbound connections."
app.Usage = "A simple HTTP server that will return 200 OK if the configured checks are all successful."
app.Commands = nil
app.Flags = defaultFlags
app.Action = runHealthChecker
Expand All @@ -52,11 +52,15 @@ func runHealthChecker(cliContext *cli.Context) error {
opts.Logger.Infof("Note: To enable debug mode, set %s to \"true\"", ENV_VAR_NAME_DEBUG_MODE)
return err
}
if err != nil {
if err != nil {
return errors.WithStackTrace(err)
}

opts.Logger.Infof("The Health Check will attempt to connect to the following ports via TCP: %v", opts.Ports)
if len(opts.Ports) > 0 {
opts.Logger.Infof("The Health Check will attempt to connect to the following ports via TCP: %v", opts.Ports)
}
if len(opts.Scripts) > 0 {
opts.Logger.Infof("The Health Check will attempt to run the following scripts: %v", opts.Scripts)
}
opts.Logger.Infof("Listening on Port %s...", opts.Listener)
err = server.StartHttpServer(opts)
if err != nil {
Expand Down
51 changes: 41 additions & 10 deletions commands/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,33 @@ package commands

import (
"fmt"
"github.com/gruntwork-io/health-checker/options"
"github.com/gruntwork-io/gruntwork-cli/logging"
"github.com/urfave/cli"
"github.com/gruntwork-io/health-checker/options"
"github.com/sirupsen/logrus"
"github.com/urfave/cli"
"os"
"strings"
)

const DEFAULT_LISTENER_IP_ADDRESS = "0.0.0.0"
const DEFAULT_LISTENER_PORT = 5500
const DEFAULT_SCRIPT_TIMEOUT_SEC = 5
const ENV_VAR_NAME_DEBUG_MODE = "HEALTH_CHECKER_DEBUG"

var portFlag = cli.IntSliceFlag{
Name: "port",
Usage: fmt.Sprintf("[Required] The port number on which a TCP connection will be attempted. Specify one or more times. Example: 8000"),
Name: "port",
Usage: fmt.Sprintf("[One of port/script Required] The port number on which a TCP connection will be attempted. Specify one or more times. Example: 8000"),
}

var scriptFlag = cli.StringSliceFlag{
Name: "script",
Usage: fmt.Sprintf("[One of port/script Required] The path to script that will be run. Specify one or more times. Example: \"/usr/local/bin/health-check.sh --http-port 8000\""),
}

var scriptTimeoutFlag = cli.IntFlag{
Name: "script-timeout",
Usage: fmt.Sprintf("[Optional] Timeout, in seconds, to wait for the scripts to complete. Example: 10"),
Value: DEFAULT_SCRIPT_TIMEOUT_SEC,
}

var listenerFlag = cli.StringFlag{
Expand All @@ -33,6 +45,8 @@ var logLevelFlag = cli.StringFlag{

var defaultFlags = []cli.Flag{
portFlag,
scriptFlag,
scriptTimeoutFlag,
listenerFlag,
logLevelFlag,
}
Expand All @@ -58,19 +72,27 @@ func parseOptions(cliContext *cli.Context) (*options.Options, error) {
logger.SetLevel(level)

ports := cliContext.IntSlice("port")
if len(ports) == 0 {
return nil, MissingParam(portFlag.Name)

scriptArr := cliContext.StringSlice("script")
scripts := options.ParseScripts(scriptArr)

if len(ports) == 0 && len(scripts) == 0 {
return nil, OneOfParamsRequired{portFlag.Name, scriptFlag.Name}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice error handling 👍

}

scriptTimeout := cliContext.Int("script-timeout")

listener := cliContext.String("listener")
if listener == "" {
return nil, MissingParam(listenerFlag.Name)
}

return &options.Options{
Ports: ports,
Listener: listener,
Logger: logger,
Ports: ports,
Scripts: scripts,
ScriptTimeout: scriptTimeout,
Listener: listener,
Logger: logger,
}, nil
}

Expand All @@ -95,4 +117,13 @@ type MissingParam string

func (paramName MissingParam) Error() string {
return fmt.Sprintf("Missing required parameter --%s", string(paramName))
}
}

type OneOfParamsRequired struct {
param1 string
param2 string
}

func (paramNames OneOfParamsRequired) Error() string {
return fmt.Sprintf("Missing required parameter, one of --%s / --%s required", string(paramNames.param1), string(paramNames.param2))
}
Loading