Website: https://www.cast.ai
- Terraform 0.13+
A module to connect a GKE cluster to CAST AI.
Requires castai/castai
and hashicorp/google
providers to be configured.
For Phase 2 onboarding credentials from terraform-gke-iam
are required
module "castai_gke_cluster" {
source = "castai/gke-cluster/castai"
project_id = var.project_id
gke_cluster_name = var.cluster_name
gke_cluster_location = module.gke.location # cluster region or zone
gke_credentials = module.castai_gke_iam.private_key
delete_nodes_on_disconnect = var.delete_nodes_on_disconnect
autoscaler_policies_json = var.autoscaler_policies_json
default_node_configuration = module.castai_gke_cluster.node_configurations["default"]
node_configurations = {
default = {
disk_cpu_ratio = 25
subnets = [module.vpc.subnets_ids[0]]
tags = {
"node-config" : "default"
}
max_pods_per_node = 110
network_tags = ["dev"]
disk_type = "pd-balanced"
}
}
node_templates = {
spot_tmpl = {
configuration_id = module.castai_gke_cluster.node_configurations["default"]
should_taint = true
custom_labels = {
custom-label-key-1 = "custom-label-value-1"
custom-label-key-2 = "custom-label-value-2"
}
custom_taints = [
{
key = "custom-taint-key-1"
value = "custom-taint-value-1"
},
{
key = "custom-taint-key-2"
value = "custom-taint-value-2"
}
]
constraints = {
fallback_restore_rate_seconds = 1800
spot = true
use_spot_fallbacks = true
min_cpu = 4
max_cpu = 100
instance_families = {
exclude = ["e2"]
}
compute_optimized_state = "disabled"
storage_optimized_state = "disabled"
is_gpu_only = false
architectures = ["amd64"]
}
custom_instances_enabled = true
custom_instances_with_extended_memory_enabled = true
}
}
autoscaler_settings = {
enabled = true
node_templates_partial_matching_enabled = false
unschedulable_pods = {
enabled = true
headroom = {
enabled = true
cpu_percentage = 10
memory_percentage = 10
}
headroom_spot = {
enabled = true
cpu_percentage = 10
memory_percentage = 10
}
}
node_downscaler = {
enabled = true
empty_nodes = {
enabled = true
}
evictor = {
aggressive_mode = false
cycle_interval = "5s10s"
dry_run = false
enabled = true
node_grace_period_minutes = 10
scoped_mode = false
}
}
cluster_limits = {
enabled = true
cpu = {
max_cores = 20
min_cores = 1
}
}
}
}
Version 4.x.x changes:
- Removed
custom_label
attribute incastai_node_template
resource. Usecustom_labels
instead.
Old configuration:
module "castai-gke-cluster" {
node_templates = {
spot_tmpl = {
custom_label = {
key = "custom-label-key-1"
value = "custom-label-value-1"
}
}
}
}
New configuration:
module "castai-gke-cluster" {
node_templates = {
spot_tmpl = {
custom_labels = {
custom-label-key-1 = "custom-label-value-1"
}
}
}
}
Version 5.x.x changed:
- Removed
compute_optimized
andstorage_optimized
attributes incastai_node_template
resource,constraints
object. Usecompute_optimized_state
andstorage_optimized_state
instead.
Old configuration:
module "castai-gke-cluster" {
node_templates = {
spot_tmpl = {
constraints = {
compute_optimized = false
storage_optimized = true
}
}
}
}
New configuration:
module "castai-gke-cluster" {
node_templates = {
spot_tmpl = {
constraints = {
compute_optimized_state = "disabled"
storage_optimized_state = "enabled"
}
}
}
}
Version 6.3.x changed:
- Deprecated
autoscaler_policies_json
attribute. Useautoscaler_settings
instead.
Old configuration:
module "castai-gke-cluster" {
autoscaler_policies_json = <<-EOT
{
"enabled": true,
"unschedulablePods": {
"enabled": true
},
"nodeDownscaler": {
"enabled": true,
"emptyNodes": {
"enabled": true
},
"evictor": {
"aggressiveMode": false,
"cycleInterval": "5m10s",
"dryRun": false,
"enabled": true,
"nodeGracePeriodMinutes": 10,
"scopedMode": false
}
},
"nodeTemplatesPartialMatchingEnabled": false,
"clusterLimits": {
"cpu": {
"maxCores": 20,
"minCores": 1
},
"enabled": true
}
}
EOT
}
New configuration:
module "castai-gke-cluster" {
autoscaler_settings = {
enabled = true
node_templates_partial_matching_enabled = false
unschedulable_pods = {
enabled = true
}
node_downscaler = {
enabled = true
empty_nodes = {
enabled = true
}
evictor = {
aggressive_mode = false
cycle_interval = "5m10s"
dry_run = false
enabled = true
node_grace_period_minutes = 10
scoped_mode = false
}
}
cluster_limits = {
enabled = true
cpu = {
max_cores = 20
min_cores = 1
}
}
}
}
Usage examples are located in terraform provider repo
Name | Version |
---|---|
terraform | >= 0.13 |
castai | ~> 7.4 |
>= 2.49 | |
helm | >= 2.0.0 |
Name | Version |
---|---|
castai | ~> 7.4 |
helm | >= 2.0.0 |
null | n/a |
No modules.
Name | Type |
---|---|
castai_autoscaler.castai_autoscaler_policies | resource |
castai_gke_cluster.castai_cluster | resource |
castai_node_configuration.this | resource |
castai_node_configuration_default.this | resource |
castai_node_template.this | resource |
helm_release.castai_agent | resource |
helm_release.castai_cluster_controller | resource |
helm_release.castai_cluster_controller_self_managed | resource |
helm_release.castai_evictor | resource |
helm_release.castai_evictor_ext | resource |
helm_release.castai_evictor_self_managed | resource |
helm_release.castai_kvisor | resource |
helm_release.castai_kvisor_self_managed | resource |
helm_release.castai_pod_pinner | resource |
helm_release.castai_pod_pinner_self_managed | resource |
helm_release.castai_spot_handler | resource |
null_resource.wait_for_cluster | resource |
Name | Description | Type | Default | Required |
---|---|---|---|---|
agent_values | List of YAML formatted string values for agent helm chart | list(string) |
[] |
no |
agent_version | Version of castai-agent helm chart. Default latest | string |
null |
no |
api_grpc_addr | CAST AI GRPC API address | string |
"api-grpc.cast.ai:443" |
no |
api_url | URL of alternative CAST AI API to be used during development or testing | string |
"https://api.cast.ai" |
no |
autoscaler_policies_json | Optional json object to override CAST AI cluster autoscaler policies. Deprecated, use autoscaler_settings instead. |
string |
null |
no |
autoscaler_settings | Optional Autoscaler policy definitions to override current autoscaler settings | any |
null |
no |
castai_api_token | Optional CAST AI API token created in console.cast.ai API Access keys section. Used only when wait_for_cluster_ready is set to true |
string |
"" |
no |
castai_components_labels | Optional additional Kubernetes labels for CAST AI pods | map(any) |
{} |
no |
cluster_controller_values | List of YAML formatted string values for cluster-controller helm chart | list(string) |
[] |
no |
cluster_controller_version | Version of castai-cluster-controller helm chart. Default latest | string |
null |
no |
default_node_configuration | ID of the default node configuration | string |
n/a | yes |
delete_nodes_on_disconnect | Optionally delete Cast AI created nodes when the cluster is destroyed | bool |
false |
no |
evictor_ext_values | List of YAML formatted string with evictor-ext values | list(string) |
[] |
no |
evictor_ext_version | Version of castai-evictor-ext chart. Default latest | string |
null |
no |
evictor_values | List of YAML formatted string values for evictor helm chart | list(string) |
[] |
no |
evictor_version | Version of castai-evictor chart. Default latest | string |
null |
no |
gke_cluster_location | Location of the cluster to be connected to CAST AI. Can be region or zone for zonal clusters | string |
n/a | yes |
gke_cluster_name | Name of the cluster to be connected to CAST AI. | string |
n/a | yes |
gke_credentials | Optional GCP Service account credentials.json | string |
n/a | yes |
grpc_url | gRPC endpoint used by pod-pinner | string |
"grpc.cast.ai:443" |
no |
install_security_agent | Optional flag for installation of security agent (https://docs.cast.ai/product-overview/console/security-insights/) | bool |
false |
no |
kvisor_values | List of YAML formatted string values for kvisor helm chart | list(string) |
[] |
no |
kvisor_version | Version of kvisor chart. If not provided, latest version will be used. | string |
null |
no |
kvisor_controller_extra_args | Map of extra arguments for the kvisor controller | map(string) |
{ kube-linter-enabled = true image-scan-enabled = true kube-bench-enabled = true } |
no |
node_configurations | Map of GKE node configurations to create | any |
{} |
no |
node_templates | Map of node templates to create | any |
{} |
no |
pod_pinner_version | Version of pod-pinner helm chart. Default latest | string |
null |
no |
project_id | The project id from GCP | string |
n/a | yes |
self_managed | Whether CAST AI components' upgrades are managed by a customer; by default upgrades are managed CAST AI central system. | bool |
false |
no |
spot_handler_values | List of YAML formatted string values for spot-handler helm chart | list(string) |
[] |
no |
spot_handler_version | Version of castai-spot-handler helm chart. Default latest | string |
null |
no |
wait_for_cluster_ready | Wait for cluster to be ready before finishing the module execution, this option requires castai_api_token to be set |
bool |
false |
no |
Name | Description |
---|---|
castai_node_configurations | Map of node configurations ids by name |
castai_node_templates | Map of node template by name |
cluster_id | CAST.AI cluster id, which can be used for accessing cluster data using API |