generated from runwhen-contrib/codecollection-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #15 from infracloudio/f/docs
Overhaul of docs
- Loading branch information
Showing
4 changed files
with
264 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# CodeBundle - RDS MySQL Connection Count | ||
|
||
This codebundle targets to detect and resolve an incident caused by too many sleeping connections in MySQL. | ||
|
||
- Target Service - MySQL | ||
- Cloud Platform - AWS/RDS | ||
|
||
## SLX | ||
```YAML | ||
statement: RDS MySql connections should be within 80% of total max connection. | ||
alias: RDS MySql Connections Count | ||
metricType: gauge | ||
asMeasuredBy: Score based on promethues query | ||
icon: Cloud | ||
owners: | ||
- [email protected] | ||
imageURL: >- | ||
https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/kubernetes/resources/labeled/ns.svg | ||
``` | ||
## SLO / Service Level Objective | ||
Example: | ||
```YAML | ||
codeBundle: | ||
repoUrl: https://github.com/infracloudio/ifc-rw-codecollection | ||
pathToYaml: codebundles/slo-default/queries.yaml | ||
ref: main | ||
sloSpecType: simple-mwmb | ||
objective: 95 | ||
threshold: 48 | ||
operand: lt | ||
``` | ||
## SLI / Service Level Indicator | ||
```YAML | ||
displayUnitsLong: OK | ||
displayUnitsShort: ok | ||
locations: | ||
- location-01-us-west1 | ||
description: >- | ||
Watch RDS MySql connection count | ||
codeBundle: | ||
repoUrl: https://github.com/infracloudio/ifc-rw-codecollection | ||
ref: main | ||
pathToRobot: codebundles/rds-mysql-conn-count/sli.robot | ||
# read more about intervalStrategy here: https://docs.runwhen.com/public/runwhen-platform/feature-overview/points-on-the-map-slxs/service-level-indicators-slis/interval-strategies | ||
intervalStrategy: intermezzo | ||
intervalSeconds: 30 | ||
configProvided: | ||
# Change PROMETHEUS_HOSTNAME to your endpoint and currently endpoint needs to be publicly exposed. | ||
- name: PROMETHEUS_HOSTNAME | ||
value: >- | ||
http://aeccfb7ff9bfb4705b6218294a7346c3-2081802229.us-west-2.elb.amazonaws.com/prometheus/api/v1 | ||
- name: QUERY | ||
value: >- | ||
aws_rds_database_connections_average{dimension_DBInstanceIdentifier="robotshopmysql"} > 1 | ||
- name: TRANSFORM | ||
value: RAW | ||
- name: STEP | ||
value: '30' | ||
- name: DATA_COLUMN | ||
value: '1' | ||
- name: NO_RESULT_OVERWRITE | ||
value: 'Yes' | ||
- name: NO_RESULT_VALUE | ||
value: '0' | ||
servicesProvided: | ||
- name: curl | ||
locationServiceName: curl-service.shared | ||
``` | ||
## RunBook / Mitigation | ||
```YAML | ||
location: location-01-us-west1 | ||
codeBundle: | ||
repoUrl: https://github.com/infracloudio/ifc-rw-codecollection | ||
ref: main | ||
pathToRobot: codebundles/rds-mysql-conn-count/runbook.robot | ||
servicesProvided: | ||
- name: curl | ||
locationServiceName: curl-service.shared | ||
configProvided: | ||
- name: MYSQL_USER | ||
value: admin | ||
- name: MYSQL_HOST | ||
value: robotshopmysql.example.us-west-2.rds.amazonaws.com | ||
- name: PROCESS_USER | ||
value: shipping | ||
``` | ||
### Assumptions & Pitfalls | ||
These configs are placeholder YAML. one needs to modify them according to need and then paste them to the platform side. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
# RunWhen Concepts | ||
- [RunWhen Concepts](#runwhen-concepts) | ||
- [Runwhen Local](#runwhen-local) | ||
- [CheatSheet Generator](#cheatsheet-generator) | ||
- [Uploading Cluster Topology to the Platform](#uploading-cluster-topology-to-the-platform) | ||
- [CodeCollections](#codecollections) | ||
- [CodeBundles](#codebundles) | ||
|
||
# Runwhen Local | ||
- [source-code](https://github.com/runwhen-contrib/runwhen-local) | ||
- [Helm Chart](https://github.com/runwhen-contrib/helm-charts/tree/main/charts/runwhen-local) | ||
- [Upstream docs](https://docs.runwhen.com/public/v/runwhen-local/) | ||
|
||
RunWhen Local has two core functions: | ||
- Generate remediation scripts / CheatSheets from included templates for your local cluster | ||
- Upload Cluster Topology to the RunWhen Platform | ||
|
||
## CheatSheet Generator | ||
At the moment RunWhen Local **does not posses the ability to discover issues** in | ||
your cluster and suggest mitigation runbooks / codebundles. | ||
|
||
**However, it discovers your kubernetes resources and object names.** | ||
Using which, it generates a wide set of runbooks for you, if you already know the | ||
root cause. These runbooks contain documentation and pastable shell script | ||
snippets for the searched issue. These scripts / cheatsheet are already pre-templated | ||
with your namespaces and kubernetes resource names. | ||
|
||
This collection of cheatsheets / runbooks, although not exhaustive, covers a significant portion | ||
of recurring issues and healthcheck failures and can be useful to SREs for quick | ||
resolution of incidents. | ||
|
||
[Upstream Examples](https://docs.runwhen.com/public/v/runwhen-local/user-guide/features/user_guide-feature_overview) | ||
|
||
## Uploading Cluster Topology to the Platform | ||
The second core function of runwhen-local is to upload cluster topology to the | ||
runwhen platform so you can visualize the cluster workload map from a configured | ||
runwhen workspace. | ||
|
||
- First, follow documentation at [Upload to RunWhen Platform](https://docs.runwhen.com/public/v/runwhen-local/user-guide/features/upload-to-runwhen-platform#upload-from-the-cli) | ||
- To generate the `uploadInfo.yaml` file | ||
- Next, take the yaml object and copy over it's contents to `uploadInfo:[]` section | ||
of the helm [`values.yaml` file](https://github.com/runwhen-contrib/helm-charts/blob/main/charts/runwhen-local/values.yaml#L121) | ||
- Once configured it should look like this: | ||
```YAML | ||
uploadInfo: | ||
workspaceName: <your-workspace-name> | ||
token: <your token> # Do NOT add token and commit to git | ||
workspaceOwnerEmail: [email protected] | ||
papiURL: https://papi.beta.runwhen.com | ||
defaultLocation: location-01-us-west1 # available runwhen locations | ||
``` | ||
- You should pass the token from helm cli, to ensure you are not leaking the token via git | ||
```bash | ||
helm upgrade --install ${HELM_RELEASE_NAME} runwhen-contrib/runwhen-local \ | ||
--set uploadInfo.token=${RUNWHEN_PLATFORM_TOKEN} \ | ||
-f ${VALUES_FILE} -n ${NAMESPACE} | ||
``` | ||
|
||
# CodeCollections | ||
CodeCollections are a group of CodeBundles that can be referenced and used in RunWhen Platform. | ||
|
||
*N.B. It's important to note here that currently codecollections cannot be imported explicitly and run against your local cluster using RunWhen Local* | ||
|
||
Currently RunWhen has published two codecollections: | ||
- [runwhen-public-codecollection](https://github.com/runwhen-contrib/rw-public-codecollection) | ||
- These contain codebundles that are usually run against services and doesn't involved a Shell / CLI component | ||
- [runwhen-cli-codecollection](https://github.com/runwhen-contrib/rw-cli-codecollection) | ||
- These are generally targeted towards SRE workloads and wraps various shell-scripts and CLI tooling. | ||
|
||
# CodeBundles | ||
CodeBundles are specific detectors/mitigators of known SLI/SLO violations in a live software stack. | ||
|
||
It comprises of: | ||
- Robot files | ||
- Scripts / Playbooks / tasksets written using [Robot Framework](), that either | ||
- Create and enforce RunWhen SLIs - `sli.robot` | ||
- Create miitigation runbooks in response to an SLO/SLI violation - `runbook.robot` | ||
- Platform definitions of `{SLX, SLO, SLI, Runbook}` as `YAML` configurations | ||
- These do not need to be located in your repo, however it's a good practice to have them committed in git. | ||
- These configurations wrap standard behaviors for interacting with RunWhen Platform API, `papi` | ||
- Endpoint: `https://papi.beta.runwhen.com` | ||
- The RunWhen `YAML` configurations are only pertinent when your codebundle is live on RunWhen Platform, these do not play any role as of now for either local testing or RunWhen Local. | ||
- Test resources / scripts | ||
|
||
In a local testing environment you only need to execute the `*.robot` files inside the provided container configurations, | ||
- [Dockerfile](../../Dockerfile) | ||
- [vscode/devcontainer](../../.devcontainer.json) | ||
|
||
|
||
The usual call chain is as follows: | ||
- Robot Scripts | ||
- User variable and secret injection | ||
- Runwhen Libraries | ||
- RunWhen Services | ||
- Wrapped shell CLI command / Platform SDK code execution | ||
- or, direct shims to your shell scripts / python code when services are unavailable | ||
- These tasks fetch the current value of a metric / state | ||
- This metric value is then compared against the defined thresholds at `sli/slo.yaml` in the platform. | ||
- If the Robot script just runs a set of tasks as a mitigation step, it returns either success or failure. | ||
|
||
More concepts and non-trivial FAQs around writing CodeBundles are explained at [Contributing to CodeCollections/CodeBundles](contrib.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Contributing to CodeCollections/CodeBundles | ||
|
||
## Creating a New CodeCollection | ||
### Forking the template repository | ||
|
||
## Writing a Non-trivial CodeBundle | ||
### Directory structure / Scaffolding | ||
|
||
|
||
|
||
######### | ||
Repository Setup | ||
Introduction to Robot Framework Scripts (how it interacts with RunWhen) | ||
Calling bash with relative paths | ||
Secret handling | ||
Suite Initialization | ||
Library usage | ||
Explain the call chain | ||
Library Setup | ||
How to get an exhaustive list of available libraries | ||
CLI repo | ||
Public repo | ||
Explain what libraries would be auto-fetched by devcontainer tooling | ||
Core | ||
CLI | ||
What needs to be added for specific libraries that are used in a robot script | ||
Paths | ||
Running a test with local docker | ||
Adding additional binaries to devcontainer as needed | ||
Mysql-client | ||
Postgres-client | ||
Redis-client | ||
Configuring Env / secrets | ||
Expose endpoints | ||
Local docker network | ||
Expose from test cluster | ||
Test by using docker run on localhost | ||
Test in your live environment | ||
Deploy as a k8s job | ||
Give an example | ||
Testing on Runwhen Platform | ||
Connecting test env/cluster to runwhen | ||
Runwhen-local upload | ||
If Robot script needs to use additional dependencies, like CLI tools the devs need to be informed and for now they will handle the update on platform side | ||
Mysql-client | ||
Postgres-client | ||
Redis-client | ||
Registering your first codecollection to Runwhen-platform | ||
Mention that this may be in private as per developer discretion | ||
How to configure the YAML to test | ||
Branch name length limitations | ||
Expose metric endpoints so that they are accessible to runwhen-platform codebundles | ||
Configuring Env / secrets | ||
Running the test | ||
Checking logs | ||
Checking for errors |