From 0a647783f29651c2f8d25fbe44f395b3cae6a17a Mon Sep 17 00:00:00 2001 From: migueldavid Date: Wed, 5 Feb 2020 11:32:28 +0100 Subject: [PATCH] Fix typos --- allocation-api.md | 54 ++++++++++++++-------------- architecture.md | 25 +++++++------ audit.md | 16 ++++----- availability-tiers.md | 10 +++--- aws-out-of-cluster.md | 4 +-- aws-service-account-thanos.md | 2 +- bug-report.md | 4 +-- cost-allocation.md | 26 +++++++------- custom-prom.md | 38 ++++++++++---------- federated-clusters.md | 24 ++++++------- gcp-out-of-cluster.md | 4 +-- getting-started.md | 56 ++++++++++++++--------------- google-service-account-thanos.md | 6 ++-- install.md | 10 +++--- long-term-storage.md | 35 +++++++++---------- multi-cluster.md | 6 ++-- network-allocation.md | 24 ++++++------- partner-metrics.md | 14 ++++---- security.md | 2 +- troubleshoot-install.md | 60 ++++++++++++++++---------------- user-management.md | 8 ++--- 21 files changed, 212 insertions(+), 216 deletions(-) diff --git a/allocation-api.md b/allocation-api.md index b4d001f23..be2be0d5b 100644 --- a/allocation-api.md +++ b/allocation-api.md @@ -1,4 +1,4 @@ -Kubecost exposes multiple APIs to obtain cost, resource allocation, and utilization data. Below is documentation on two options: the cost model API and aggregated cost model API. +Kubecost exposes multiple APIs to obtain cost, resource allocation, and utilization data. Below is documentation on two options: the cost model API and aggregated cost model API. # Cost model API @@ -12,7 +12,7 @@ Here's an example use: API parameters include the following: -* `timeWindow` dictates the applicable window for measuring cost metrics. Supported units are d, h, and m. +* `timeWindow` dictates the applicable window for measuring cost metrics. Supported units are d, h, and m. * `offset` shifts timeWindow backwards relative to the current time. Supported units are d, h, and m. This API returns a set of JSON elements in this format: @@ -35,50 +35,50 @@ This API returns a set of JSON elements in this format: ramreq: [{timestamp: 1567531940, value: 55000000}] ramused: [{timestamp: 1567531940, value: 19463457.32}] services: ["cost-model"] -} +} ``` - -Optional request parameters include the following: + +Optional request parameters include the following: -Field | Description ---------- | ----------- +Field | Description +--------- | ----------- `filterFields` | Blacklist of fields to be filtered from response. For example, appending `&filterFields=cpuused,cpureq,ramreq,ramused` will remove request and usage data. `namespace` | Filter results by namespace. For example, appending `&namespace=kubecost` only returns data for the `kubecost` namespace # Aggregated cost model API -The aggregated cost model API retrieves data similiar to the Kubecost Allocation frontend view (e.g. cost by namespace, label, deployment and more) and is available at the following endpoint: +The aggregated cost model API retrieves data similar to the Kubecost Allocation frontend view (e.g. cost by namespace, label, deployment and more) and is available at the following endpoint: `http:///model/aggregatedCostModel` Here are example uses: -* `http://localhost:9090/model/aggregatedCostModel?window=1d&aggregation=namespace` +* `http://localhost:9090/model/aggregatedCostModel?window=1d&aggregation=namespace` * `http://localhost:9090/model/aggregatedCostModel?window=1d&aggregation=label&aggregationSubfield=product` * `http://localhost:9090/model/aggregatedCostModel?window=1d&aggregation=namespace&sharedNamespaces=kube-system` API parameters include the following: -* `window` dictates the applicable window for measuring cost metrics. Supported units are d, h, m, and s. -* `offset` (optional) shifts window backwards from current time. Supported units are d, h, m, and s. -* `aggregation` is the field used to consolidate cost model data. Supported types are cluster, namespace, deployment, service, and label. +* `window` dictates the applicable window for measuring cost metrics. Supported units are d, h, m, and s. +* `offset` (optional) shifts window backwards from current time. Supported units are d, h, m, and s. +* `aggregation` is the field used to consolidate cost model data. Supported types are cluster, namespace, deployment, service, and label. * `aggregationSubfield` used for aggregation types that require sub fields, e.g. aggregation type equals `label` and the value of the label (aggregationSubfield) equals `app`. * `allocateIdle` (optional) when set to `true` applies the cost of all idle compute resources to tenants, default `false`. -* `sharedNamespaces` (optional) provide a comma separated list of namespaces (e.g. kube-system) to be allocated to other tenants. These resources are evenly allocated to other tenants as `sharedCost`. -* `sharedLabelNames` (optional) provide a comma separated list of kubernetes labels (e.g. app) to be allocated to other tenants. Must provide corresponding set of label values in `sharedLabelValues`. -* `sharedLabelValues` (optional) label value (e.g. prometheus) associated with `sharedLabelNames` parameter. +* `sharedNamespaces` (optional) provide a comma-separated list of namespaces (e.g. kube-system) to be allocated to other tenants. These resources are evenly allocated to other tenants as `sharedCost`. +* `sharedLabelNames` (optional) provide a comma-separated list of kubernetes labels (e.g. app) to be allocated to other tenants. Must provide the corresponding set of label values in `sharedLabelValues`. +* `sharedLabelValues` (optional) label value (e.g. prometheus) associated with `sharedLabelNames` parameter. * `sharedSplit` (optional) Shared costs are split evenly across tenants unless `weighted` is passed for this request parameter. When allocating shared costs on a weighted basis, these expenses are distributed based on the resource costs of the individual pods in the particular aggregation, e.g. namespace. -* `disableCache` this API caches recently fetched data by default. Set this variable to `false` to avoid cache entirely. +* `disableCache` this API caches recently fetched data by default. Set this variable to `false` to avoid cache entirely. - -Optional filter parameters include the following: + +Optional filter parameters include the following: -Filter | Description ---------- | ----------- -`cluster` | Filter results by cluster ID. For example, appending `&cluster=cluster-one` will restrict data only to the `cluster-one` cluster. Note: cluster ID is generated from `cluster_id` provided during installation. +Filter | Description +--------- | ----------- +`cluster` | Filter results by cluster ID. For example, appending `&cluster=cluster-one` will restrict data only to the `cluster-one` cluster. Note: cluster ID is generated from `cluster_id` provided during installation. `namespace` | Filter results by namespace. For example, appending `&namespace=kubecost` only returns data for the `kubecost` namespace. -`labels` | Filter results by label. For example, appending `&labels=app%3Dcost-analyzer` only returns data for pods with label `app=cost-analyzer`. CSV list of label values supported. Note that parameters must be url encoded. +`labels` | Filter results by label. For example, appending `&labels=app%3Dcost-analyzer` only returns data for pods with label `app=cost-analyzer`. CSV list of label values supported. Note that parameters must be url encoded. This API returns a set of JSON objects in this format: @@ -87,7 +87,7 @@ This API returns a set of JSON objects in this format: aggregation: "namespace" subfields: "" // value(s) of aggregationSubfield parameter cluster: "cluster-1" - cpuCost: 100.031611 + cpuCost: 100.031611 environment: "default" // value of aggregation field gpuCost: 0 networkCost: 0 @@ -95,14 +95,14 @@ This API returns a set of JSON objects in this format: ramCost: 70.000529625 sharedCost: 0 // value of costs allocated via sharedNamespaces or sharedLabelNames totalCost: 180.032140625 -} +} ``` -### Caching Overview +### Caching Overview -Kubecost implements a two-layer caching system for cost allocation metrics. The unaggregated cost model (Layer 1 cache) is pre-cached for commonly used time windows, e.g. 1,2,7 and 30 days. This data is refreshed every ~5 minutes for shorter time windows and up to every 1 hour for long windows, e.g. 30 days. Aggregated cost data (Layer 2 cache) is pre-cached for commonly used aggregatedCostModel API requests, e.g. costs by namespace over the last 7 days. Returning cached data from Layer 1 typically takes < 500ms and Layer 2 < 100ms, not including data transfer times. +Kubecost implements a two-layer caching system for cost allocation metrics. The unaggregated cost model (Layer 1 cache) is pre-cached for commonly used time windows, e.g. 1,2,7 and 30 days. This data is refreshed every ~5 minutes for shorter time windows and up to every 1 hour for long windows, e.g. 30 days. Aggregated cost data (Layer 2 cache) is pre-cached for commonly used aggregatedCostModel API requests, e.g. costs by namespace over the last 7 days. Returning cached data from Layer 1 typically takes < 500ms and Layer 2 < 100ms, not including data transfer times. -When a custom cost model request misses both layers of the Kubecost cache then this request remains in the cache for ~10 minutes. On larger clusters, requests that miss both caching layers can take longer periods, e.g. > 10 seconds. +When a custom cost model request misses both layers of the Kubecost cache then this request remains in the cache for ~10 minutes. On larger clusters, requests that miss both caching layers can take longer periods, e.g. > 10 seconds. Have questions? Email us at . diff --git a/architecture.md b/architecture.md index 058472d7e..53a6ff9dd 100644 --- a/architecture.md +++ b/architecture.md @@ -1,24 +1,23 @@ ## Core architecture overview Below are the major components to the Kubecost helm chart: - -1. **Kubecost Cost-Analyzer Pod** - a. Frontend that runs Nginx -- handles routing to Prometheus/Grafana - b. Kubecost server -- backend for API calls + +1. **Kubecost Cost-Analyzer Pod** + a. Frontend that runs Nginx -- handles routing to Prometheus/Grafana + b. Kubecost server -- backend for API calls c. Cost-model -- provides cost allocation calculations and metrics, reads/writes to Prometheus 2. **Cost-Analyzer Jobs** -- used for product alerts & email updates -3. **Prometheus** - a. Prometheus server -- time series data store for cost & health metrics - b. Kube-state-metrics -- provides Kubernetes requests and other core metrics - c. Node-exporter -- provides node-level utilization metrics for right-sizing recommendations and cluster utlization - d. Pushgateway -- ability to push new metrics to Prometheus - e. Alertmanager -- used for custom alerts +3. **Prometheus** + a. Prometheus server -- time-series data store for cost & health metrics + b. Kube-state-metrics -- provides Kubernetes requests and other core metrics + c. Node-exporter -- provides node-level utilization metrics for right-sizing recommendations and cluster utilization + d. Pushgateway -- ability to push new metrics to Prometheus + e. Alertmanager -- used for custom alerts 4. **Network costs** -- optional daemonset for collecting network metrics -5. **Grafana** -- supporting dashboards +5. **Grafana** -- supporting dashboards -Today, the core Kubecost product can be run with just components 1, 3a, 3b, 3c. +Today, the core Kubecost product can be run with just components 1, 3a, 3b, 3c. See core components on this diagram: ![Architecture Overview](images/arch.png) - diff --git a/audit.md b/audit.md index 5ad3ede4a..78ab3e0bf 100644 --- a/audit.md +++ b/audit.md @@ -1,13 +1,13 @@ -Auditing the cost of workloads can be complex in dynamic Kubernetes environments. -We've created this guide to help you spot check costs and ensure they are calculated as expected. +Auditing the cost of workloads can be complex in dynamic Kubernetes environments. +We've created this guide to help you spot check costs and ensure they are calculated as expected. -1. **Identify a pod or namespace to audit.** In this example we will audit the `default` namespace. -2. **Open Prometheus console.** We recommend going directly to the underlying data in Prometheus for an audit. Complete the following steps to view the console for our bundled Prometheus: +1. **Identify a pod or namespace to audit.** In this example, we will audit the `default` namespace. +2. **Open Prometheus console.** We recommend going directly to the underlying data in Prometheus for an audit. Complete the following steps to view the console for our bundled Prometheus: * Execute `kubectl port-forward --namespace kubecost service/kubecost-prometheus-server 9003:80` * Point your browser to -3. **Verify raw allocation metrics.** Run the following queries and then visit the Prometheus graph tab. Note that allocations are the max of resource requests and usage. Ensure these values are consistent with Kubernetes API and/or cadvisor metrics. +3. **Verify raw allocation metrics.** Run the following queries and then visit the Prometheus graph tab. Note that allocations are the max of resource requests and usage. Ensure these values are consistent with Kubernetes API and/or cAdvisor metrics. * `container_cpu_allocation{namespace="default"}` * `container_memory_allocation_bytes{namespace="default"}` @@ -18,15 +18,15 @@ We've created this guide to help you spot check costs and ensure they are calcul * `node_ram_hourly_cost * 730` * `node_total_hourly_cost * 730` * `kube_node_status_capacity_cpu_cores * on(node) group_left() node_cpu_hourly_cost * 730 + kube_node_status_capacity_memory_bytes * on(node) group_left() node_ram_hourly_cost * 730 / 1024 / 1024 / 1024` - + **Note:** Prometheus values do not account for sustained use, custom prices, or other discounts applied in Settings. -5. **Calculate total resource costs.** Multiply the previously audited allocation by the previously audited price. +5. **Calculate total resource costs.** Multiply the previously audited allocation by the previously audited price. * `container_cpu_allocation{namespace="default"} * on(instance) group_left() node_cpu_hourly_cost * 730` * `container_memory_allocation_bytes{namespace="default"} / 1024 / 1024 / 1024 * on(instance) group_left() node_ram_hourly_cost * 730` -6. **Confirm consistency with monthly Allocation view.** Visit the Allocation tab in the Kubecost product. Filter by `default ` namespace. Select `monthly run rate` by `pod` then view the time series chart to confirm the values in the previous step are consistent. +6. **Confirm consistency with monthly Allocation view.** Visit the Allocation tab in the Kubecost product. Filter by `default ` namespace. Select `monthly run rate` by `pod` then view the time series chart to confirm the values in the previous step are consistent. ![Timeseries graph](images/audit-graph.png) diff --git a/availability-tiers.md b/availability-tiers.md index 971c47566..a3dd87dbd 100644 --- a/availability-tiers.md +++ b/availability-tiers.md @@ -1,14 +1,14 @@ -Availability Tiers impact capacity recommendations, health ratings and more in the Kubecost product. As an example, production jobs receive higher resource request recommendations than dev workloads. Another example is health scores for high availability workloads are heavily penalized for not having multiple replicas availabile. +Availability Tiers impact capacity recommendations, health ratings and more in the Kubecost product. As an example, production jobs receive higher resource request recommendations than dev workloads. Another example is health scores for high availability workloads are heavily penalized for not having multiple replicas available. Today our product supports the following tiers: Tier | Priority | Default --------- | ----------- | ------- -`Highly Available` or `Critical` | 0 | If true, recommendations and health scores heavily prioritize availability. This is the default tier if none is supplied. -`Production` | 1 | Intended for production jobs that are not necessarily mission critical. -`Staging` or `Dev` | 2 | Meant for experimental or development resources. Redundancy or availability is not a high priority. +`Highly Available` or `Critical` | 0 | If true, recommendations and health scores heavily prioritize availability. This is the default tier if none is supplied. +`Production` | 1 | Intended for production jobs that are not necessarily mission-critical. +`Staging` or `Dev` | 2 | Meant for experimental or development resources. Redundancy or availability is not a high priority. -To apply a namespace tier, add a `tier` namespace label to reflect the desired value. +To apply a namespace tier, add a `tier` namespace label to reflect the desired value. Have questions or feedback? Contact us at . diff --git a/aws-out-of-cluster.md b/aws-out-of-cluster.md index 383d5585e..f6e697621 100644 --- a/aws-out-of-cluster.md +++ b/aws-out-of-cluster.md @@ -33,7 +33,7 @@ To access billing data in Athena tables, and to enable other Kubecost functional *We recommend [kiam](https://github.com/uswitch/kiam) as a solution for adding IAM credentials directly to the Kubecost pod(s).* ### Cost and Usage Permissions Policy -The below policy is designed to provide Kubecost least-priviledge access to AWS Cost and Usage data. +The below policy is designed to provide Kubecost least-privilege access to AWS Cost and Usage data. Validate the following resource names in the below IAM policy before applying to your account: * `"Sid": "ReadAccessToAthenaCurDataViaGlue"`: Validate the `database` and `table` ARNs listed. If you used the AWS managed deployment, as described in Step #4, this should already be set correctly. If you set up the Cost and Usage report to Athena flow manually, you may need to adjust this value. @@ -196,4 +196,4 @@ Visit the Kubecost Settings page to provide the AWS access credentials and Athen ## Having issues? * You may need to upgrade your AWS Glue if you are running an old version https://docs.aws.amazon.com/athena/latest/ug/glue-upgrade.html -* Visit the Allocation view in the Kubecost product. If external costs are not shown, open your browser's Developer Tools > Console to see any reported errors. +* Visit the Allocation view in the Kubecost product. If external costs are not shown, open your browser's Developer Tools > Console to see any reported errors. diff --git a/aws-service-account-thanos.md b/aws-service-account-thanos.md index 49c7e899d..5268662ab 100644 --- a/aws-service-account-thanos.md +++ b/aws-service-account-thanos.md @@ -39,7 +39,7 @@ In order to create an AWS IAM policy for use with Thanos:   6. Provide a User name (e.g. `kubecost-thanos-service-account`) and select `Programmatic access` -  7. Select Attach existing policies directly, search for the policy name provided in step 4, and then create user. +  7. Select Attach existing policies directly, search for the policy name provided in step 4, and then create the user. ![image](/attach-existing.png) diff --git a/bug-report.md b/bug-report.md index 46c62aef4..b202c7771 100644 --- a/bug-report.md +++ b/bug-report.md @@ -1,11 +1,11 @@ # Capture a bug report -The Kubecost bug report feature captures relevent product configuration data and logs to debug an outstanding issue. +The Kubecost bug report feature captures relevant product configuration data and logs to debug an outstanding issue. To capture a bug report: visit __Settings__, scroll to the bottom, and select __CAPTURE BUG REPORT__. ![Bug report button in setings](images/bug-report.png) -We recommend sharing a bug report directly with our team and not distributing broadly becuase log data is included. +We recommend sharing a bug report directly with our team and not distributing broadly because log data is included. __Note:__ the bug report feature requires namespace logs access, which is granted by default in Kubecost v1.51.2. diff --git a/cost-allocation.md b/cost-allocation.md index 055257f60..984a3f655 100644 --- a/cost-allocation.md +++ b/cost-allocation.md @@ -1,26 +1,26 @@ # Kubernetes Cost Allocation -The Kubecost Allocation view allows you to quickly see allocated spend across all native Kubernetes concepts, e.g. namespace and service. It also allows for allocating cost to organizational concepts like team, product/project, department, or environment. This document explains the metrics presented and describes how you can control the data displayed in this view. +The Kubecost Allocation view allows you to quickly see allocated spend across all native Kubernetes concepts, e.g. namespace and service. It also allows for allocating cost to organizational concepts like team, product/project, department, or environment. This document explains the metrics presented and describes how you can control the data displayed in this view. ![Cost allocation view](cost-allocation.png) -### 1. Cost metrics -View either cumulative costs measured over the selected time window, or run rate (e.g. hourly, daily, monthly) based on the resources allocated. Costs allocations are based on the following: +### 1. Cost metrics +View either cumulative costs measured over the selected time window or run rate (e.g. hourly, daily, monthly) based on the resources allocated. Costs allocations are based on the following: 1) resources allocated, i.e. max of requests and usage 2) the cost of each resource 3) the amount of time resources were provisioned -For more information, refer to this [FAQ](https://github.com/kubecost/cost-model#frequently-asked-questions) on how each of these inputs are determined based on your environment. +For more information, refer to this [FAQ](https://github.com/kubecost/cost-model#frequently-asked-questions) on how each of these inputs is determined based on your environment. -### 2. Aggregation -Aggregate cost by namespace, deployment, service and other native Kubernetes concepts. Costs are also visible by other meaningful organizational concepts, e.g. Team, Department, or Product. These aggregations are based on Kubernetes labels or annotations, referenced at both the pod and namespace-level, with labels at the pod-level being favored over the namespace label when both are present. The label name used for these concepts can be configured in Settings. Resources without a label/annotation will be shown as _unassigned_. +### 2. Aggregation +Aggregate cost by namespace, deployment, service and other native Kubernetes concepts. Costs are also visible by other meaningful organizational concepts, e.g. Team, Department, or Product. These aggregations are based on Kubernetes labels or annotations, referenced at both the pod and namespace-level, with labels at the pod-level being favored over the namespace label when both are present. The label name used for these concepts can be configured in Settings. Resources without a label/annotation will be shown as _unassigned_. ### 3. Time window -The designated time window for measuring costs. Results for 1d, 2d, 7d, and 30d queries are cached by default. +The designated time window for measuring costs. Results for 1d, 2d, 7d, and 30d queries are cached by default. ### 4. Filter -Filter resouces by namespace, clusterId, and Kubernetes label to more closely investigate a rise in spend or key cost drivers at different aggregations, e.g. Deployments or Pods. +Filter resources by namespace, clusterId, and Kubernetes label to more closely investigate a rise in spend or key cost drivers at different aggregations, e.g. Deployments or Pods. ### 5. Allocate Idle Cost Allocating idle costs proportionately assigns total cluster costs, including slack resources, to tenants. As an example, if your cluster is only 25% utilized, applying idle costs will increase the cost of each pod/namespace by 4x. @@ -30,16 +30,16 @@ Toggle to the bar chart view to see aggregated costs over the selected window, o ### 7. Additional options View other options to export cost data to CSV or view help documentation. - + ### Cost metrics Cost allocation metrics are available for both in-cluster and out-of-cluster resources. Here are short descriptions of each metric: -| Metric | Description | +| Metric | Description | |-------------------- |--------------------- | -| Memory cost | The total cost of memory allocated to this object, e.g. namespace or deployment. The amount of memory allocated is the greater of memory usage and memory requested over the measured time window. The price of allocated memory is based on cloud billing APIs or custom pricing sheets. [Learn more](https://github.com/kubecost/cost-model#questions)| -| CPU Cost | The total cost of CPU allocated to this object, e.g. namespace or deployment. The amount of CPU allocated is the greater of CPU usage and CPU requested over the measured time window. The price of allocated CPU is based on cloud billing APIs or custom pricing sheets. [Learn more](https://github.com/kubecost/cost-model#questions) | -| Network Cost | The cost of network traffic based on internet egress, cross-zone egress, and other billed transfer.. Note: these costs must be enabled. [Learn more](http://docs.kubecost.com/network-allocation)| +| Memory cost | The total cost of memory allocated to this object, e.g. namespace or deployment. The amount of memory allocated is the greater of memory usage and memory requested over the measured time window. The price of allocated memory is based on cloud billing APIs or custom pricing sheets. [Learn more](https://github.com/kubecost/cost-model#questions)| +| CPU Cost | The total cost of CPU allocated to this object, e.g. namespace or deployment. The amount of CPU allocated is the greater of CPU usage and CPU requested over the measured time window. The price of allocated CPU is based on cloud billing APIs or custom pricing sheets. [Learn more](https://github.com/kubecost/cost-model#questions) | +| Network Cost | The cost of network traffic based on internet egress, cross-zone egress, and other billed transfer. Note: these costs must be enabled. [Learn more](http://docs.kubecost.com/network-allocation)| | PV Cost | The cost of persistent storage volumes claimed by this object. Prices are based on cloud billing prices or custom pricing sheets for on-prem deployments. | | GPU Cost | The cost of GPUs requested by this object, as measured by [resource limits](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/). Prices are based on cloud billing prices or custom pricing sheets for on-prem deployments. | | External Cost | The cost of out-of-cluster resources allocated to this object. For example, S3 buckets allocated to a particular Kubernetes deployment. Prices are based on cloud billing data and require a key. This feature is currently available for AWS ([learn more](http://docs.kubecost.com/aws-out-of-cluster.html)) and GCP ([learn more](http://docs.kubecost.com/gcp-out-of-cluster.html)). | diff --git a/custom-prom.md b/custom-prom.md index ae280f0fa..7f733b78d 100644 --- a/custom-prom.md +++ b/custom-prom.md @@ -1,26 +1,26 @@ # Custom Prometheus -Integrating Kubecost with an existing Prometheus installation can be nuanced. We recommend first installing Kubecost with a bundled Prometheus ([instructions](http://kubecost.com/install)) as a dry run before integrating with an external Prometheus deployment. We also recommend getting in touch (team@kubecost.com) for assistance. +Integrating Kubecost with an existing Prometheus installation can be nuanced. We recommend first installing Kubecost with a bundled Prometheus ([instructions](http://kubecost.com/install)) as a dry run before integrating with an external Prometheus deployment. We also recommend getting in touch (team@kubecost.com) for assistance. **Note:** integrating with an existing Prometheus is only supported under Kubecost paid plans. -__Requirements__ +__Requirements__ Kubecost requires the following dependency versions: - node-exporter - v0.16 (May 18) - kube-state-metrics - v1.6.0 (May 19) - - cadvisor - kubelet v1.11.0 (May 18) + - cAdvisor - kubelet v1.11.0 (May 18) __Implementation Steps__ 1. Copy [values.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/values.yaml) and update the following parameters: - - - `promtheus.fqdn` to match your local Prometheus with this format `http://..svc.cluster.local` - - `prometheus.enabled` set to `false` - - Pass this updated file to the Kubecost helm install command with `--values values.yaml` + + - `promtheus.fqdn` to match your local Prometheus with this format `http://..svc.cluster.local` + - `prometheus.enabled` set to `false` + + Pass this updated file to the Kubecost helm install command with `--values values.yaml` 2. Have your Prometheus scrape the cost-model `/metrics` endpoint. These metrics are needed for reporting accurate pricing data. Here is an example scrape config: @@ -36,32 +36,32 @@ __Implementation Steps__ - kubecost-cost-analyzer. type: 'A' port: 9003 -``` +``` This config needs to be added under `extraScrapeConfigs` in Prometheus configuration. [Example](https://github.com/kubecost/cost-analyzer-helm-chart/blob/0758d5df54d8963390ca506ad6e58c597b666ef8/cost-analyzer/values.yaml#L74) -You can confirm that this job is successfully running with the Targets view in Prometheus. +You can confirm that this job is successfully running with the Targets view in Prometheus. ![Prometheus Targets](/prom-targets.png) -__Recording Rules__ +__Recording Rules__
Kubecost uses [Prometheus recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) to enable certain product features and to help improve product performance. These are recommended additions, especially for medium and large-sized clusters using their own Prometheus installation. You can find our recording rules under _rules_ in this [values.yaml file](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/values.yaml#L169). __Troubleshooting Issues__ -Common issues include the following: +Common issues include the following: -* Wrong Prometheus FQDN: evidenced by the following pod error message `No valid prometheus config file at ...`. We recommend running `curl /api/v1/status/config` from a pod in the cluster to confirm that your [Prometheus config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration-file) is returned. If not, this is an indication that an incorrect Prometheus Url has been provided. If a config file is returned, then the Kubecost pod likely has it's access restricted by a cluster policy, service mesh, etc. +* Wrong Prometheus FQDN: evidenced by the following pod error message `No valid prometheus config file at ...`. We recommend running `curl /api/v1/status/config` from a pod in the cluster to confirm that your [Prometheus config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration-file) is returned. If not, this is an indication that an incorrect Prometheus Url has been provided. If a config file is returned, then the Kubecost pod likely has its access restricted by a cluster policy, service mesh, etc. * Prometheus throttling -- ensure Prometheus isn't being CPU throttled due to a low resource request. -* Wrong dependency version -- see section above about Requirements +* Wrong dependency version -- see the section above about Requirements * Missing scrape configs -- visit Prometheus Target page (screenshot above) -* Data incorrectly is a single namespace -- make sure that [honor_labels](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) is enabled +* Data incorrectly is a single namespace -- make sure that [honor_labels](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) is enabled You can visit Settings in Kubecost to see basic diagnostic information on these Prometheus metrics: @@ -71,9 +71,9 @@ You can visit Settings in Kubecost to see basic diagnostic information on these # Custom Grafana -Using an existing Grafana deployment can be accomplished with either of the following two options: +Using an existing Grafana deployment can be accomplished with either of the following two options: -1) _Option: Configure in Kubecost product._ After the default Kubecost installation, visit Settings and update __Grafana Address__ to a URL (e.g. http://demo.kubecost.com/grafana) that is visisble to users accessing Grafana dashboards. Next, import Kubecost Grafana dashboards as JSON from this [folder](https://github.com/kubecost/cost-analyzer-helm-chart/tree/master/cost-analyzer). +1) _Option: Configure in Kubecost product._ After the default Kubecost installation, visit Settings and update __Grafana Address__ to a URL (e.g. http://demo.kubecost.com/grafana) that is visible to users accessing Grafana dashboards. Next, import Kubecost Grafana dashboards as JSON from this [folder](https://github.com/kubecost/cost-analyzer-helm-chart/tree/master/cost-analyzer). ![Kubecost Settings](/images/settings-grafana.png) @@ -94,5 +94,5 @@ grafana:       For Option 2, ensure that the following flags are set in your Operator deployment: -      1. sidecar.dashboards.enabled = true -      2. sidecar.dashboards.searchNamespace isn't restrictive, use `ALL` if Kubecost runs in another ns +      1. sidecar.dashboards.enabled = true +      2. sidecar.dashboards.searchNamespace isn't restrictive, use `ALL` if Kubecost runs in another ns diff --git a/federated-clusters.md b/federated-clusters.md index 5a3d28319..f87cfe7a7 100644 --- a/federated-clusters.md +++ b/federated-clusters.md @@ -1,17 +1,17 @@ -To view data from multipe clusters simultaneously, Kubecost cluster federation must be enabled. -This documents walks through the necessary steps for enabling this feature. +To view data from multiple clusters simultaneously, Kubecost cluster federation must be enabled. +This document walks through the necessary steps for enabling this feature. -**Note:** this feature today requires an Enterprise license. +**Note:** This feature today requires an Enterprise license. # Master cluster (Postgres) 1. Follow steps [here](long-term-storage.md) to enable long-term storage. -2. Ensure `remoteWrite.postgres.installLocal` is set to `true` in values.yaml +2. Ensure `remoteWrite.postgres.installLocal` is set to `true` in values.yaml 3. Provide a unique identifier for your cluster in `prometheus.server.global.exernal_labels.cluster_id` -4. Create a service definition to make Postgres accessible by your other clusters. Below is a sample service definition. -Warning: this specific service defition may expose your database externally with just basic auth protecting. -Be sure the follow the necessary guidelines of your organiztion. +4. Create a service definition to make Postgres accessible by your other clusters. Below is a sample service definition. +Warning: this specific service definition may expose your database externally with just basic auth protecting. +Be sure the follow the necessary guidelines of your organization. ``` apiVersion: v1 @@ -37,10 +37,10 @@ spec: # Secondary clusters (Postgres) -Following these steps for clusters that send data to the master cluster: +Following these steps for clusters that send data to the master cluster: 1. Same as you did for the master, follow steps [here](long-term-storage.md) to enable long-term storage. -2. Set `remoteWrite.postgres.installLocal` to `false` in values.yaml so you do not redeploy Postgres in this cluster. +2. Set `remoteWrite.postgres.installLocal` to `false` in values.yaml so you do not redeploy Postgres in this cluster. 3. Set `prometheus.server.global.exernal_labels.cluster_id` to any unique identifier of your cluster, e.g. dev-cluster-7. 4. Set `prometheus.remoteWrite.postgres.remotePostgresAddress` to the externally accessible IP from master cluster. 5. Ensure `postgres.auth.password` is updated to reflect the value set at the master. @@ -61,21 +61,21 @@ You should see data with both `cluster_id` values in this response. 1. Follow steps [here](long-term-storage.md#option-b-out-of-cluster-storage-thanos) to enable Thanos durable storage on a Master cluster. -2. Complete the process in Step 1 for each additional secondary cluster by reusing your existing storage bucket and access credentials. Note: it is not necessary to deploy another instance of `thanos-compact` or `thanos-bucket` in each additional cluster. These are optional, but they can easily be disabled in [thanos/values.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/charts/thanos/values.yaml) or by passing these parameters directly via helm install or upgrade as follows: +2. Complete the process in Step 1 for each additional secondary cluster by reusing your existing storage bucket and access credentials. Note: it is not necessary to deploy another instance of `thanos-compact` or `thanos-bucket` in each additional cluster. These are optional, but they can easily be disabled in [thanos/values.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/charts/thanos/values.yaml) or by passing these parameters directly via helm install or upgrade as follows: ``` --set thanos.compact.enabled=false --set thanos.bucket.enabled=false ```         -You can also optionally disable `thanos.store` and `thanos.query` with thanos/values.yaml or with these flags: +You can also optionally disable `thanos.store` and `thanos.query` with thanos/values.yaml or with these flags: ``` --set thanos.query.enabled=false --set thanos.store.enabled=false ```         -Clusters with store/query disabled will only have access to their metrics but will still write to the global bucket. +Clusters with store/query disabled will only have access to their metrics but will still write to the global bucket. 3. Ensure you provide a unique identifier for `prometheus.server.global.exernal_labels.cluster_id` to have additional clusters be visible in the Kubecost product, e.g. `cluster-two`. diff --git a/gcp-out-of-cluster.md b/gcp-out-of-cluster.md index 8a5770b48..2f538903d 100644 --- a/gcp-out-of-cluster.md +++ b/gcp-out-of-cluster.md @@ -8,7 +8,7 @@ The following guide provides the steps required for allocating out of cluster co ## Step 2: Visit Kubecost setup page and provide configuration info -In Kubecost, vist the Cost Allocation page and select "Add Key". +In Kubecost, visit the Cost Allocation page and select "Add Key". ![Add key](/add-key.png) @@ -21,7 +21,7 @@ On this page, you will see instructions for providing a service key, project ID, ## Step 3: Label cloud assets -You can now label assets with the following schema to allocate costs back to their appropriate Kubernetes owner. +You can now label assets with the following schema to allocate costs back to their appropriate Kubernetes owner. Learn more [here](https://cloud.google.com/compute/docs/labeling-resources) on GCP labeling.
diff --git a/getting-started.md b/getting-started.md
index 6be97a9b2..8a7b0bec2 100644
--- a/getting-started.md
+++ b/getting-started.md
@@ -1,49 +1,49 @@
 # Getting Started
 
-Welcome to Kubecost! This page provides commonly used product configurations and feature overviews to help get you up and running after the Kubecost product has been [installed](http://kubecost.com/install). 
-
-__Configuration__  
-[Configuring metric storage](#storage-config)  
-[Setting Requests & Limits](#requests-limits)  
-[Using an existing Prometheus or Grafana installation](#custom-prom)  
-[Using an existing node exporter installation](#node-exporter)  
-[Exposing Kubecost with an Ingress](#basic-auth)  
-[Adding a spot instance configuration (AWS only)](#spot-nodes)  
+Welcome to Kubecost! This page provides commonly used product configurations and feature overviews to help get you up and running after the Kubecost product has been [installed](http://kubecost.com/install).
+
+__Configuration__
+[Configuring metric storage](#storage-config)
+[Setting Requests & Limits](#requests-limits)
+[Using an existing Prometheus or Grafana installation](#custom-prom)
+[Using an existing node exporter installation](#node-exporter)
+[Exposing Kubecost with an Ingress](#basic-auth)
+[Adding a spot instance configuration (AWS only)](#spot-nodes)
 [Allocating out of cluster costs](#out-of-cluster)
 
-__Next Steps__  
-[Measure cluster cost efficiency](#cluster-efficiency)  
-[Cost monitoring best practices](http://blog.kubecost.com/blog/cost-monitoring/)   
+__Next Steps__
+[Measure cluster cost efficiency](#cluster-efficiency)
+[Cost monitoring best practices](http://blog.kubecost.com/blog/cost-monitoring/)
 [Understanding cost allocation metrics](/cost-allocation.md)
 

## Storage configuration -The default Kubecost installation comes with a 32Gb persistent volume and 15-day retention period for Prometheus metrics. This is enough space to retain data for ~300 pods, depending on your exact node and container count. See the Kubecost Helm chart [configuration options](https://github.com/kubecost/cost-analyzer-helm-chart) to adjust both retention period and storage size. **Note:** we do not recommend retaining greater than 30 days of data in Prometheus. For long-term data retention, contact us (team@kubecost.com) about using Kubecost with durable storage enabled. +The default Kubecost installation comes with a 32Gb persistent volume and a 15-day retention period for Prometheus metrics. This is enough space to retain data for ~300 pods, depending on your exact node and container count. See the Kubecost Helm chart [configuration options](https://github.com/kubecost/cost-analyzer-helm-chart) to adjust both retention period and storage size. **Note:** We do not recommend retaining greater than 30 days of data in Prometheus. For long-term data retention, contact us (team@kubecost.com) about using Kubecost with durable storage enabled. ## Bring your own Prometheus or Grafana -The Kubecost Prometheus deployment is used as both as source and sink for cost & capacity metrics. It's optimized to not interfere with other observability instrimentation and by default only contains metrics that are useful to the Kubecost product. This amounts to retaining 70-90% less metrics than a standard Prometheus deployment. +The Kubecost Prometheus deployment is used as both as a source and a sink for cost & capacity metrics. It's optimized to not interfere with other observability instrumentation and by default only contains metrics that are useful to the Kubecost product. This amounts to retaining 70-90% fewer metrics than a standard Prometheus deployment. For the best experience, we generally recommend teams use the bundled `prometheus-server` & `grafana` but reuse their existing `kube-state-metrics` and `node-exporter` deployments if they already exist. This setup allows for the easiest installation process, easiest on-going maintenance, minimal duplication of metrics, and more flexible metric retention. -That being said, we do support using an existing Grafana & Prometheus installation in our paid products today. You can see basic setup instructions [here](/custom-prom.md). In our free product, we only provide best efforts support for this integration because of nuances required in completing this integration successfully. Please contact us (team@kubecost.com) if you want to learn more or if you think we can help! +That being said, we do support using an existing Grafana & Prometheus installation in our paid products today. You can see basic setup instructions [here](/custom-prom.md). In our free product, we only provide best efforts support for this integration because of the nuances required in completing this integration successfully. Please contact us (team@kubecost.com) if you want to learn more or if you think we can help! ## Setting Requests & Limits -It's recommended that users set and/or update resource requests and limits before taking Kubecost into production at scale. These inputs can be configured in the Kubecost [values.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/values.yaml) for Kubecost modules + subcharts. +It's recommended that users set and/or update resource requests and limits before taking Kubecost into production at scale. These inputs can be configured in the Kubecost [values.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/values.yaml) for Kubecost modules + subcharts. -Exact recommended values for these parameters depends on the size of your cluster, availability requirements, and usage of the Kubecost product. Suggested values for each container can be found within Kubecost itself on the namespace page. More info on these recommendations is available [here](http://blog.kubecost.com/blog/requests-and-limits/). +Exact recommended values for these parameters depend on the size of your cluster, availability requirements, and usage of the Kubecost product. Suggested values for each container can be found within Kubecost itself on the namespace page. More info on these recommendations is available [here](http://blog.kubecost.com/blog/requests-and-limits/). In practice, we recommend running Kubecost for up to 7 days on a production cluster and then tuning resource requests/limits based on resource consumption. Reach out any time to team@kubecost.com if we can help give further guidance. -## Using an existing node exporter +## Using an existing node exporter You can use an existing node exporter DaemonSet by setting the `prometheus.nodeExporter.enabled` and `prometheus.serviceAccounts.nodeExporter.create` Kubecost helm chart config options to `false` More configs options shown [here](https://github.com/kubecost/cost-analyzer-helm-chart). Note: this requires your existing node exporter to be configured to export metrics on the default endpoint/port. -## Kubecost Ingress example +## Kubecost Ingress example -Enabling external access to the Kubecost product simply requires exposing access to port 9090 on the `kubecost-cost-analyzer` pod. This can be accomplished with a number of approaches, including Ingress or Service definitions. The following definition provides an example Ingress with basic auth. +Enabling external access to the Kubecost product simply requires exposing access to port 9090 on the `kubecost-cost-analyzer` pod. This can be accomplished with a number of approaches, including Ingress or Service definitions. The following definition provides an example of Ingress with basic auth. Note: on GCP, you will need to update the `kubecost-cost-analyzer` service to become a `NodePort` instead of a `ClusterIP` type service. @@ -72,9 +72,9 @@ spec: backend: serviceName: kubecost-cost-analyzer servicePort: 9090 -``` +``` -## Spot Instance Configuration (AWS only) +## Spot Instance Configuration (AWS only) For more accurate Spot pricing data, visit Settings in the Kubecost frontend to configure a [data feed](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-data-feeds.html) for AWS Spot instances. This enables the Kubecost product to have actual Spot node prices vs user-provided estimates. @@ -104,32 +104,32 @@ For more accurate Spot pricing data, visit Settings in the Kubecost frontend to **Spot Verification** -View logs from the `cost-model` container in the `kubecost-cost-analyzer` pod to confirm there are no Spot data feed access errors. You should also see a confirmation log statement like this: +View logs from the `cost-model` container in the `kubecost-cost-analyzer` pod to confirm there are no Spot data feed access errors. You should also see a confirmation log statement like this: ``` I1104 00:21:02.905327       1 awsprovider.go:1397] Found spot info {Timestamp:2019-11-03 20:35:48 UTC UsageType:USE2-SpotUsage:t2.micro Operation:RunInstances:SV050 InstanceID:i-05487b228492b1a54 MyBidID:sir-9s4rvgbj MyMaxPrice:0.010 USD MarketPrice:0.004 USD Charge:0.004 USD Version:1} I1104 00:21:02.922372       1 awsprovider.go:1376] Spot feed version is "#Version: 1.0" ``` -The Charge figures in logs should be reflected in your `node_total_hourly_cost` metrics in Prometheus. +The Charge figures in logs should be reflected in your `node_total_hourly_cost` metrics in Prometheus. ## Allocating out of cluster costs -**[AWS]** Provide your congifuration info in Settings. The information needs includes S3 bucket name, Athena table name, Athena table region, and Athena database name. View [this page](/aws-out-of-cluster.md) for more information on completing this process. +**[AWS]** Provide your configuration info in Settings. The information needs to include the S3 bucket name, the Athena table name, the Athena table region, and the Athena database name. View [this page](/aws-out-of-cluster.md) for more information on completing this process. **[GCP]** Provide configuration info by selecting "Add key" from the Cost Allocation Page. View [this page](/gcp-out-of-cluster.md) for more information on completing this process. ## Measuring cluster cost efficiency -For teams interested in reducing their Kubernetes costs, we have seen it be beneficial to first understand how efficiently provisioned resources have been used. This can be answered by measuring the cost of idle resources (e.g. compute, memory, etc) as a percentage of your overall cluster spend. This figure represents the impact of many infrastructure and application-level decision, i.e. machine type selection, bin packing efficiency, and more. The Kubecost product (Cluster Overview page) provides a view into this data for an initial assessment of resource efficiency and the cost of waste. +For teams interested in reducing their Kubernetes costs, we have seen it be beneficial to first understand how efficiently provisioned resources have been used. This can be answered by measuring the cost of idle resources (e.g. compute, memory, etc) as a percentage of your overall cluster spend. This figure represents the impact of many infrastructure and application-level decisions, i.e. machine type selection, bin packing efficiency, and more. The Kubecost product (Cluster Overview page) provides a view into this data for an initial assessment of resource efficiency and the cost of waste.
-With an overall understanding of idle spend you will have a better sense for where to focus efforts for efficiency gains. Each resource type can now be tuned for your business. Most teams we’ve seen end up targeting utilization in the following ranges: +With an overall understanding of idle spend, you will have a better sense of where to focus efforts for efficiency gains. Each resource type can now be tuned for your business. Most teams we’ve seen end up targeting utilization in the following ranges: * CPU: 50%-65% * Memory: 45%-60% * Storage: 65%-80% -Target figures are highly dependent on the predictability and distribution of your resource usage (e.g. P99 vs median), the impact of high utilization on your core product/business metrics, and more. While too low resource utilization is wasteful, too high utilization can lead to latency increases, reliability issues, and other negative behavior. +Target figures are highly dependent on the predictability and distribution of your resource usage (e.g. P99 vs median), the impact of high utilization on your core product/business metrics, and more. While too low resource utilization is wasteful, too high utilization can lead to latency increases, reliability issues, and other negative behavior. diff --git a/google-service-account-thanos.md b/google-service-account-thanos.md index 716fef0c1..2cab92fc5 100644 --- a/google-service-account-thanos.md +++ b/google-service-account-thanos.md @@ -13,13 +13,13 @@ In order to create a Google service account for use with Thanos: ![image](https://user-images.githubusercontent.com/334480/66667856-faf5cf00-ec21-11e9-817d-65c2dad92af4.png) -#### Press `Create`. +#### Press `Create`. You should now be at the `Service account permissions (optional)` screen. Click inside the `Role` box, and set the first entry to **Storage Object Creator**. Click the `+ Add Another Role` and set the second entry to **Storage Object Viewer**. ![image](https://user-images.githubusercontent.com/334480/66667955-2ed0f480-ec22-11e9-90cb-b160b8170aa4.png) #### Hit Continue -You should now be prompted to allow specific accounts access to this service account. This is should managed based on specific needs internally and is not a requirement. You can leave empty and press `Done` +You should now be prompted to allow specific accounts access to this service account. This should be based on specific internal needs and is not a requirement. You can leave empty and press `Done` #### Create a Key Once back to the service accounts menu, select the `...` at the end of the entry you just created and press `Create Key` @@ -27,4 +27,4 @@ Once back to the service accounts menu, select the `...` at the end of the entry ![image](https://user-images.githubusercontent.com/334480/66668267-d3ebcd00-ec22-11e9-9e8c-4f178b8dd265.png) #### Confirm JSON -Confirm a JSON key and hit `Create`. This will download a JSON service account key entry for use with the Thanos `object-store.yaml` mentioned in the initial setup step. +Confirm a JSON key and hit `Create`. This will download a JSON service account key entry for use with the Thanos `object-store.yaml` mentioned in the initial setup step. diff --git a/install.md b/install.md index b6df4910d..ca2376396 100644 --- a/install.md +++ b/install.md @@ -1,11 +1,11 @@ # Installing Kubecost
-* The recommended path to install and operate Kubecost is via the Helm chart install instructions available at [kubecost.com/install](http://kubecost.com/install). This chart contains all the required components to get started, and can scale to large deployments. It also provides the most flexibility for configuring Kubecost and its dependencies. +* The recommended path to install and operate Kubecost is via the Helm chart install instructions available at [kubecost.com/install](http://kubecost.com/install). This chart contains all the required components to get started and can scale to large deployments. It also provides the most flexibility for configuring Kubecost and its dependencies. + +* Alternatively, you can install via [flat manifest](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/README.md#manifest), but this install path provides less flexibility for managing your deployment and has several product limitations. + +* Lastly, you can deploy the open-source cost model engine directly as a pod. This install path provides a subset of Kubecost functionality and is available [here](https://github.com/kubecost/cost-model/blob/master/deploying-as-a-pod.md). -* Alternatively, you can install via [flat manifest](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/README.md#manifest), but this install path provides less flexiblity for managing your deployment and has several product limitations. - -* Lastly, you can deploy the open source, cost model engine directly as a pod. This install path provides a subset of Kubecost functionality and is available [here](https://github.com/kubecost/cost-model/blob/master/deploying-as-a-pod.md). -



diff --git a/long-term-storage.md b/long-term-storage.md index 89559cf92..95ca10d29 100644 --- a/long-term-storage.md +++ b/long-term-storage.md @@ -1,28 +1,27 @@ -To enable 90+ days of data retention in Kubecost, we recommend deploying with durable storage enabled. We provide two options for doing this: 1) in your cluster and 2) out of cluster. This functionality also powers the Enterprise multi-cluster view, where data across clusters can be viewed in aggregate, as well as simple backup & restore capabilities. +To enable 90+ days of data retention in Kubecost, we recommend deploying with durable storage enabled. We provide two options for doing this: 1) in your cluster and 2) out of the cluster. This functionality also powers the Enterprise multi-cluster view, where data across clusters can be viewed in aggregate, as well as simple backup & restore capabilities. -**Note:** this feature today requires an Enterprise license. +**Note:** This feature today requires an Enterprise license. ## Option A: In cluster storage (Postgres) To enable Postgres-based long-term storage, complete the following: -1. **Helm chart configuration** -- in [values.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/values.yaml) set the `remoteWrite.postgres.enabled` attribute -to true. The default backing disk is `200gb` but this can also be directly configured in values.yaml. - +1. **Helm chart configuration** -- in [values.yaml](https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/values.yaml) set the `remoteWrite.postgres.enabled` attribute to true. The default backing disk is `200gb` but this can also be directly configured in values.yaml. + 2. **Verify successful install** -- Deploy or upgrade via install instructions at , passing this updated values.yaml file, and verify pods with the prefix `kubecost-cost-analyzer-adapter` and `kubecost-cost-analyzer-postgres` are Running. -3. **Confirm data is availabile** +3. **Confirm data is available**        Vist this endpoint `http:///model/costDataModelRangeLarge`        Here's an example use: `http://localhost:9090/model/costDataModelRangeLarge` -## Option B: Out of cluster storage (Thanos) +## Option B: Out of the cluster storage (Thanos) Thanos-based durable storage provides long-term metric retention directly in a user-controlled bucket (e.g. S3 or GCS bucket) and can be enabled with the following steps: -Step 1: **Create object store yaml file** +Step 1: **Create object-store yaml file** This step creates a yaml file that contains your durable storage target (e.g. GCS, S3, etc.) configuration and access credentials. The details of this file are documented thoroughly in Thanos documentation: https://thanos.io/storage.md/ @@ -50,13 +49,13 @@ config: "client_x509_cert_url": "" } ``` -**Note:** given that this is yaml, it requires this specific indention. +**Note:** given that this is yaml, it requires this specific indention. **Warning:** do not apply a retention policy to your Thanos bucket, as it will prevent Thanos compaction from completing. __AWS/S3__ -Start by creating a new S3 bucket with all public access blocked. No other bucket configuration changes should be required. The following example uses a bucket named `kc-thanos-store`. +Start by creating a new S3 bucket with all public access blocked. No other bucket configuration changes should be required. The following example uses a bucket named `kc-thanos-store`. Next, add an IAM policy to access this bucket ([instructions](/aws-service-account-thanos.md)). @@ -83,9 +82,9 @@ config: part_size: 134217728 ``` -**Note:** given that this is yaml, it requires this specific indention. +**Note:** given that this is yaml, it requires this specific indention. -Step 2: **Create object store secret** +Step 2: **Create object-store secret** The final step prior to installation is to create a secret with the yaml file generated in the previous step: ``` @@ -110,11 +109,11 @@ $ helm install kubecost/cost-analyzer \ Your deployment should now have Thanos enabled! -> Note: the `thanos-store` pod is by default configured to request 2 Gb in memory. +> Note: the `thanos-store` pod is by default configured to request 2 Gb in memory. -**Verify Installation** -In order to verify a correct installation, start by ensuring all pods are running without issue. If the pods mentioned above are not running successfully, then view pod logs for more detail. A common error is as follows, which means you do not have the correct access to the supplied bucket: +**Verify Installation** +In order to verify a correct installation, start by ensuring all pods are running without issue. If the pods mentioned above are not running successfully, then view pod logs for more detail. A common error is as follows, which means you do not have the correct access to the supplied bucket: ``` thanos-svc-account@project-227514.iam.gserviceaccount.com does not have storage.objects.list access to thanos-bucket., forbidden" @@ -135,11 +134,11 @@ If you navigate to the *Stores* using the top navigation bar, you should be able Also note that the sidecar should identify with the unique `cluster_id` provided in your values.yaml in the previous step. Default value is `cluster-one`. -The default retention period for when data is moved into the object storage is currently *2h* - This configuration is based on Thanos suggested values. __By default, it will be 2 hours before data is written to the provided bucket.__ +The default retention period for when data is moved into the object storage is currently *2h* - This configuration is based on Thanos suggested values. __By default, it will be 2 hours before data is written to the provided bucket.__ -Instead of waiting *2h* to ensure that thanos was configured correctly, the default log level for the thanos workloads is `debug` (it's very light logging even on debug). You can get logs for the `thanos-sidecar`, which is part of the `prometheus-server` pod, and `thanos-store`. The logs should give you a clear indication whether or not there was a problem consuming the secret and what the issue is. For more on Thanos architecture, view [this resource](https://thanos.io/quick-tutorial.md/#components). +Instead of waiting *2h* to ensure that thanos was configured correctly, the default log level for the thanos workloads is `debug` (it's very light logging even on debug). You can get logs for the `thanos-sidecar`, which is part of the `prometheus-server` pod, and `thanos-store`. The logs should give you a clear indication of whether or not there was a problem consuming the secret and what the issue is. For more on Thanos architecture, view [this resource](https://thanos.io/quick-tutorial.md/#components). -If a cluster is not successfully writing data to the backet, we recommend reviewing `thanos-sidecar` logs with the following command: +If a cluster is not successfully writing data to the bucket, we recommend reviewing `thanos-sidecar` logs with the following command: ``` kubectl logs kubecost-prometheus-server- -n kubecost -c thanos-sidecar diff --git a/multi-cluster.md b/multi-cluster.md index 4d091df71..5e4695dd0 100644 --- a/multi-cluster.md +++ b/multi-cluster.md @@ -1,10 +1,10 @@ -Kubecost supports the ability to view cost and health data across multiple Kubernetes clusters and cloud providers. +Kubecost supports the ability to view cost and health data across multiple Kubernetes clusters and cloud providers. Below are the steps for adding an additional cluster to your Kubecost frontend. **Steps** -1. Install Kubecost on the additional cluster you would like to view. The recommended Kubecost install path is available at [kubecost.com/install](http://kubecost.com/install). -2. Expose port 9090 of the `kubecost-cost-analyzer` pod. This can be done with a Kubernetes ingress ([example](https://github.com/kubecost/docs/blob/e82db0bff942dbb8abf6d74b979b10b121bce705/getting-started.md#basic-auth)) or loadbalancer ([example](https://github.com/kubecost/docs/blob/master/kubecost-lb.yaml)). **Warning:** by default a LoadBalancer exposes endpoints to the wide internet. Be careful about following the authentication requirements of your organization and environment. +1. Install Kubecost on the additional cluster you would like to view. The recommended Kubecost install path is available at [kubecost.com/install](http://kubecost.com/install). +2. Expose port 9090 of the `kubecost-cost-analyzer` pod. This can be done with a Kubernetes Ingress ([example](https://github.com/kubecost/docs/blob/e82db0bff942dbb8abf6d74b979b10b121bce705/getting-started.md#basic-auth)) or LoadBalancer ([example](https://github.com/kubecost/docs/blob/master/kubecost-lb.yaml)). **Warning:** by default a LoadBalancer exposes endpoints to the wide internet. Be careful about following the authentication requirements of your organization and environment. 3. Select Add new cluster on the Kubecost home page and provide the accessible URL (with port included) for the target Kubecost installation. Here's an example: `http://e9a706220bae04199-1639813551.us-east-2.elb.amazonaws.com:9090` ![Add a cluster view](kubecost-index.png) diff --git a/network-allocation.md b/network-allocation.md index 4e9cd91c0..5a2c7d7e2 100644 --- a/network-allocation.md +++ b/network-allocation.md @@ -1,8 +1,8 @@ # Network Traffic Cost Allocation -This document summarizes Kubecost network cost allocation, how to enable it, and what it provides. +This document summarizes Kubecost network cost allocation, how to enable it, and what it provides. -When this feature is enabled, Kubecost gathers network traffic metrics in combination with provider specific network costs to provide insight on network data sources as well as the aggregate costs of transfers. +When this feature is enabled, Kubecost gathers network traffic metrics in combination with provider-specific network costs to provide insight on network data sources as well as the aggregate costs of transfers. ### Enabling Network Costs @@ -10,15 +10,15 @@ To enable this feature, set the following parameter in values.yaml during [Helm ``` networkCosts.enabled=true ``` - You can view a list of common Kubecost chart config options [here](https://github.com/kubecost/cost-analyzer-helm-chart#config-options). - - **Note:** network cost, disabled by default, run as a privileged pod to access the relevent networking kernal module on the host. + You can view a list of common Kubecost chart config options [here](https://github.com/kubecost/cost-analyzer-helm-chart#config-options). + + **Note:** network cost, disabled by default, run as a privileged pod to access the relevant networking kernel module on the host. ### Kubernetes Network Traffic Metrics The primary source of network metrics come from a daemonset pod hosted on each of the nodes in a cluster. Each daemonset pod uses `hostNetwork: true` such that it can leverage an underlying kernel module to capture network data. Network traffic data is gathered and the destination of any outbound networking is labeled as: - * Internet Egress: Network target destination was not identified within the cluster. + * Internet Egress: Network target destination was not identified within the cluster. * Cross Region Egress: Network target destination was identified, but not in the same provider region. * Cross Zone Egress: Network target destination was identified, and was part of the same region but not the same zone. @@ -28,7 +28,7 @@ These classifications are important because they correlate with network costing kubectl logs kubecost-network-costs- -n kubecost ``` -This will show you top source and destination IP addresses and bytes transfered on the node where this pod is running. +This will show you top source and destination IP addresses and bytes transferred on the node where this pod is running. ### Whitelisting internal addresses @@ -38,14 +38,12 @@ For addresses that are outside of your cluster but inside your VPC, Kubecost sup To verify this feature is functioning properly, you can complete the following steps. -1. Confirm the `kubecost-network-costs` pods are Running. If these pods are not in a Running state, _kubectl describe_ them and/or view their logs for errors. -2. Ensure `kubecost-networking` target is Up in your Prometheus Targets list. View any visible errors if this target is not Up. -3. Verify Network Costs are available in your Kubecost Allocation view. View your browser's Developer Console on this page for any access/permissions errors if costs are not shown. +1. Confirm the `kubecost-network-costs` pods are Running. If these pods are not in a Running state, _kubectl describe_ them and/or view their logs for errors. +2. Ensure `kubecost-networking` target is Up in your Prometheus Targets list. View any visible errors if this target is not Up. +3. Verify Network Costs are available in your Kubecost Allocation view. View your browser's Developer Console on this page for any access/permissions errors if costs are not shown. ### Feature Limitations - + * Today this feature is supported on Unix-based images with conntrack * Actively tested against GCP, AWS, and Azure to date * Daemonsets have shared IP addresses on certain clusters - - diff --git a/partner-metrics.md b/partner-metrics.md index ab7583342..3c0525928 100644 --- a/partner-metrics.md +++ b/partner-metrics.md @@ -1,12 +1,12 @@ ## Standardizing Kubernetes cost allocation -Measuring costs in Kubernetes environments is complex. Applications and their resources are often ephemeral. -Teams and even departments share the same resources without transparent prices attached. -Organizations are oftentimes running resources on disparate machine types or even multiple cloud providers. -We created the Kubecost Partners project to help partners manage this complexity and provide cost visibility for their users. +Measuring costs in Kubernetes environments is complex. Applications and their resources are often ephemeral. +Teams and even departments share the same resources without transparent prices attached. +Organizations are oftentimes running resources on disparate machine types or even multiple cloud providers. +We created the Kubecost Partners project to help partners manage this complexity and provide cost visibility for their users. -To assist, we have created a common definition for determining the cost of pods, services, namespaces and more. -We provide partners a suite of APIs, documentation, and compliance tests designed to run in a range of Kubernetes environments, including all top cloud providers and on premise. +To assist, we have created a common definition for determining the cost of pods, services, namespaces and more. +We provide partners a suite of APIs, documentation, and compliance tests designed to run in a range of Kubernetes environments, including all top cloud providers and on-premise. We also provide a spec implementation in our open source repository to ensure consistency and accuracy across a broad range of projects. -Reach out to to learn more. +Reach out to to learn more. diff --git a/security.md b/security.md index d6c8c18cb..c2eb17d4a 100644 --- a/security.md +++ b/security.md @@ -1,4 +1,4 @@ -Privacy and security are incredibly important to our team. We believe that users should own and fully control their data, especially when it comes to senstive cost and product usage metrics. +Privacy and security are incredibly important to our team. We believe that users should own and fully control their data, especially when it comes to sensitive cost and product usage metrics. For this reason, our product does not share any data out of your infrastructure to external Kubecost services and use read-only Kubernetes privileges in our core product. You can even disable all internet egress in our namespace/product if desired, more [here](https://docs.projectcalico.org/v3.5/getting-started/kubernetes/tutorials/advanced-policy#4-deny-all-egress-traffic). diff --git a/troubleshoot-install.md b/troubleshoot-install.md index 1e89f3b07..cba9c2bfb 100644 --- a/troubleshoot-install.md +++ b/troubleshoot-install.md @@ -1,26 +1,26 @@ -[No persistent volumes available...](#persistent-volume) -[Unable to establish a port-forward connection](#port-forward) -[FailedScheduling node-exporter](#node-exporter) -[No clusters found](#no-cluster) +[No persistent volumes available...](#persistent-volume) +[Unable to establish a port-forward connection](#port-forward) +[FailedScheduling node-exporter](#node-exporter) +[No clusters found](#no-cluster) [Pods running but app won't load](#app-wont-load) - - -## Issue: no persistent volumes available for this claim and/or no storage class is set -Your clusters needs a default storage class for the Kubecost and Prometheus persistent volumes to be successfully attached. + +## Issue: no persistent volumes available for this claim and/or no storage class is set + +Your clusters need a default storage class for the Kubecost and Prometheus persistent volumes to be successfully attached. To check if a storage class exists, you can run ```kubectl get storageclass``` -You should see a storageclass name with (default) next to it as in this example. +You should see a storageclass name with (default) next to it as in this example.
-NAME                PROVISIONER           AGE 
+NAME                PROVISIONER           AGE
 standard (default)  kubernetes.io/gce-pd  10d
 
-If you see a name but no (default) next to it, run +If you see a name but no (default) next to it, run ```kubectl patch storageclass -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'``` @@ -28,9 +28,9 @@ If you don’t see a name, you need to add a storage class. For help doing this, * AWS: [https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html](https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html) * Azure: [https://kubernetes.io/docs/concepts/storage/storage-classes/#azure-disk](https://kubernetes.io/docs/concepts/storage/storage-classes/#azure-disk) - - - + + + ## Issue: unable to establish a port-forward connection First, check the status of pods in the target namespace: @@ -53,16 +53,16 @@ kubecost-prometheus-pushgateway-6f4f8bbfd9-k5r47 1/1 Running 0 kubecost-prometheus-server-6fb8f99bb7-4tjwn 2/2 Running 0 5m
-If the cost-analyzer or prometheus-server __pods are missing__, we recommend reinstalling with Helm using `--debug` which enables verbose output. +If the cost-analyzer or prometheus-server __pods are missing__, we recommend reinstalling with Helm using `--debug` which enables verbose output. If any __pod is not Running__ other than cost-analyzer-checks, you can use the following command to find errors in the recent event log: `kubectl describe pod -n kubecost` -Should you encounter an unexpected error, please reach out for help on [Slack](https://join.slack.com/t/kubecost/shared_invite/enQtNTA2MjQ1NDUyODE5LWFjYzIzNWE4MDkzMmUyZGU4NjkwMzMyMjIyM2E0NGNmYjExZjBiNjk1YzY5ZDI0ZTNhZDg4NjlkMGRkYzFlZTU) or via email at [team@kubecost.com](team@kubecost.com). - - - +Should you encounter an unexpected error, please reach out for help on [Slack](https://join.slack.com/t/kubecost/shared_invite/enQtNTA2MjQ1NDUyODE5LWFjYzIzNWE4MDkzMmUyZGU4NjkwMzMyMjIyM2E0NGNmYjExZjBiNjk1YzY5ZDI0ZTNhZDg4NjlkMGRkYzFlZTU) or via email at [team@kubecost.com](team@kubecost.com). + + + ## Issue: FailedScheduling kubecost-prometheus-node-exporter If one has an existing `node-exporter` daemonset, the Kubecost Helm chart may timeout due to a conflict. You can disable the installation of `node-exporter` by passing the following parameters to the Helm install. @@ -73,41 +73,41 @@ helm install kubecost/cost-analyzer --debug --wait --namespace kubecost --name k --set prometheus.nodeExporter.enabled=false \ --set prometheus.serviceAccounts.nodeExporter.create=false ``` - - - + + + ## Issue: Unable to connect to a cluster -You may encounter the following screen if the Kubecost frontend is unabled to connect with a live Kubecost server. +You may encounter the following screen if the Kubecost frontend is unable to connect with a live Kubecost server. ![No clusters found](images/no-cluster.png) Recommended troubleshooting steps are as follows: -Start by reviewing messages in your browswer's developer console. Any meaningful errors or warnings may indicate an unexpted response from the Kubecost server. +Start by reviewing messages in your browser's developer console. Any meaningful errors or warnings may indicate an unexpected response from the Kubecost server. -Next, point your browser to the `/api` endpoint on your target URL. For example, visit `http://localhost:9090/api/` in the scenario shown above. You should expect to see a Prometheus config file at this endpoint. If your cluster address has changed, you can visit Settings in the Kubecost product to update or you can also add a new cluster. +Next, point your browser to the `/api` endpoint on your target URL. For example, visit `http://localhost:9090/api/` in the scenario shown above. You should expect to see a Prometheus config file at this endpoint. If your cluster address has changed, you can visit Settings in the Kubecost product to update or you can also add a new cluster. If you are unable to successfully retrieve your config file from this endpoint, we recommend the following: 1. Check your connection to this host 2. View the status of all Prometheus and Kubecost pods to see if any pods are experiencing errors or are in a Pending state. When performing the default Kubecost install we recommend inspecting this with `kubectl get pods -n kubecost`. All pods should be either Running or Completed. -3. View relevent pod logs if any pod is not in the Running or Completed state. +3. View relevant pod logs if any pod is not in the Running or Completed state. -If you are able retrieve your config file from this endpoint, we recommend reviewing logs from the cost-analyzer pod to identify any errors. +If you are able to retrieve your config file from this endpoint, we recommend reviewing logs from the cost-analyzer pod to identify any errors. Please contact us at team@kubecost.com or on Slack at any point. ## Issue: Unable to load app -If all Kubecost pods are running and you can connect / port-forward to the kubecost-cost-analyzer pod but none of the app's UI will load, we recommend testing the following: +If all Kubecost pods are running and you can connect / port-forward to the kubecost-cost-analyzer pod but none of the app's UI will load, we recommend testing the following: -1. Connect directly to a backend service with the following command: +1. Connect directly to a backend service with the following command: `kubectl port-forward --namespace kubecost service/kubecost-cost-analyzer 9001` 2. Ensure that `http://localhost:9001` returns the prometheus YAML file -If this is true, you are likely to be hitting a CoreDNS routing isssue. We recommend using local routing as a solution: +If this is true, you are likely to be hitting a CoreDNS routing issue. We recommend using local routing as a solution: 1. Go to 2. Replace ```{% raw %}{{ $serviceName }}.{{ .Release.Namespace }}{% endraw %}``` with ```localhost``` diff --git a/user-management.md b/user-management.md index f7e2e65da..149d56c4a 100644 --- a/user-management.md +++ b/user-management.md @@ -1,10 +1,10 @@ -Kubecost’s SSO/SAML support makes it easy to manage application access and works with top identity providers. +Kubecost’s SSO/SAML support makes it easy to manage application access and works with top identity providers. Here are the high-level options for access supported: * **Basic auth** provides a simple mechanism to restrict application access internally and externally -* **Pre-defined user roles** can be managed directly in product - * Admin: has permissions to manage users, configure model inputs, and application settings. +* **Pre-defined user roles** can be managed directly in the product + * Admin: has permissions to manage users, configure model inputs, and application settings. * Viewer: user role with read-only permission -* **Namespace-level access** restrict a user's access to limited set of namespaces +* **Namespace-level access** restrict a user's access to a limited set of namespaces