Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS in AWS Curvenote account #2652

Merged
merged 53 commits into from
Sep 25, 2023
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
7e44698
curvenote-aws: bootstrap and vpc
manics May 18, 2023
6518eab
EKS with permissions boundaries and other workarounds
manics May 31, 2023
54057f1
Use remote state and dynamodb lock
manics May 31, 2023
ece451d
Add serviceaccounts.tf
manics May 31, 2023
de030c8
Update version
manics May 31, 2023
408510a
Service link role deployed separately, this now works!
manics Jun 1, 2023
e061b7c
terraform/aws/curvenote linting/formatting
manics Jun 1, 2023
16cccd9
Move terraform lint to separate workflow
manics Jun 6, 2023
8504574
Do not gitignore .terraform.lock.hcl
manics Jun 6, 2023
e964117
Update README, add OIDC tf that can be run by admin
manics Jun 6, 2023
c872505
Manually created OIDC IRSA is now indicated with var `oidc_created`
manics Jun 7, 2023
72411d6
`oidc_created=true`
manics Jun 7, 2023
565ebef
Add GitHub OIDC
manics Jun 7, 2023
d1feb31
Add GitHub OIDC for manics/mybinder.org-deploy for initial dev
manics Jun 7, 2023
0ccb81f
Add GH workflow to check EKS access
manics Jun 7, 2023
a94c8de
GitHub OIDC: fix typo (eks not ecr)
manics Jun 7, 2023
a2b83b6
Ignore .terraform.lock.hcl apart from aws/curvenote
manics Jun 12, 2023
b07dd43
Need kubernetes provider to manage EKS auth configmap
manics Jun 12, 2023
de4e919
Add hashicorp/kubernetes version, update versions
manics Jun 12, 2023
d5236eb
Add role to allow anyone in AWS account to access EKS as cluster admin
manics Jun 12, 2023
693bb50
Add `binderhub-eks-access` to aws-auth configmap
manics Jun 17, 2023
2449223
Parameterise number of nodes, disk size, bottlerocket
manics Jun 25, 2023
b91432b
Update VPC, fix ECR token IAM permissions
manics Jun 24, 2023
127eb63
Update serviceaccounts
manics Jun 27, 2023
f44c97c
Update curvenote terraform README
manics Jun 30, 2023
2089dac
aws-dev GH workflow: include main
manics Jun 30, 2023
604d592
Some tfsec fixes
manics Jun 30, 2023
ef445dd
Couple more cleanups
manics Jun 30, 2023
77e9198
aws-curvenote: allow all AWS roles to manage KMS key
manics Jul 9, 2023
814891d
Set kms_key_administrators to avoid changing when deployed by differe…
manics Jul 9, 2023
7a98cf2
aws/curvenote: Comment out outputs
manics Jul 2, 2023
995fa77
Deploy aws/curvenote Terraform
manics Jul 10, 2023
0978ed5
Update aws/curvenote README to explain GH workflow
manics Jul 10, 2023
5baff90
rm .github/workflows/aws-dev.yml
manics Jul 12, 2023
c6d21a1
aws-curvenote: use 2 nodes
manics Jul 31, 2023
19213de
Add walkthrough of how the terraform GH deployment approval works
manics Jul 31, 2023
80bdbe2
eks: node size cant be changed by terraform
manics Jul 31, 2023
6cb8d3c
Move curvenote terraform to generic binder-eks module
manics Aug 28, 2023
02cb10f
Add outputs to binder-eks module
manics Aug 28, 2023
ea59795
aws/curvenote is now just a single file that references the binder-ek…
manics Aug 28, 2023
55dd529
Use worker_group_1 instead of worker_group-1
manics Aug 28, 2023
2d4c439
binder-eks: uncomment outputs
manics Aug 28, 2023
a735a08
Update AWS/EKS readmes
manics Aug 28, 2023
432f115
Update pre-commit, cleanup old comments
manics Aug 28, 2023
aebaa80
Rename bootstrap state bucket var
manics Aug 28, 2023
8c1cfb1
terraform-deploy-aws-curvenote.yml: remove extra branch
manics Sep 18, 2023
d7036e0
curvenote tfplan: replace gpg with age
manics Sep 22, 2023
af3d573
curvenote: Install age in GH workflow
manics Sep 22, 2023
315ad74
Fix gh trigger paths for terraform-deploy-aws-curvenote.yml
manics Sep 22, 2023
f48c4f7
Remove aws_s3_bucket_server_side_encryption_configuration from bootstrap
manics Sep 22, 2023
55666c1
Remove dev OIDC access
manics Sep 22, 2023
1851b2a
Output Terraform plan to GH summary
manics Sep 24, 2023
c00c237
vpc.tf: Remove old commented azs
manics Sep 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions .github/workflows/terraform-deploy-aws-curvenote.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# See terraform/aws/curvenote/README.md
name: Terraform aws-curvenote

on:
push:
branches:
- main
paths:
- "terraform/aws/curvenote/**"
- .github/workflows/terraform-deploy.yml
workflow_dispatch:

# Only allow one workflow to run at a time
concurrency: terraform-deploy-aws-curvenote

env:
TFPLAN: aws-curvenote.tfplan
AWS_DEPLOYMENT_ROLE: arn:aws:iam::166088433508:role/binderhub-github-oidc-mybinderorgdeploy-terraform
AWS_REGION: us-east-2
WORKDIR: ./terraform/aws/curvenote

jobs:
terraform-plan:
runs-on: ubuntu-22.04
timeout-minutes: 10
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
id-token: write
contents: read
defaults:
run:
working-directory: ${{ env.WORKDIR }}
outputs:
apply: ${{ steps.terraform-plan.outputs.apply }}

steps:
- uses: actions/checkout@v3

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ env.AWS_DEPLOYMENT_ROLE }}
aws-region: ${{ env.AWS_REGION }}
role-session-name: terraform-plan

- name: Terraform plan
id: terraform-plan
run: |
terraform init
EXIT_CODE=0
terraform plan -out="${TFPLAN}" -detailed-exitcode || EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "No changes"
echo "apply=false" >> "$GITHUB_OUTPUT"
elif [ $EXIT_CODE -eq 2 ]; then
echo "Changes found"
echo "apply=true" >> "$GITHUB_OUTPUT"
else
echo "Terraform plan failed"
exit $EXIT_CODE
fi

- name: Encrypt plan
if: steps.terraform-plan.outputs.apply == 'true'
run: |
echo ${{ secrets.TFPLAN_ARTIFACT_PASSPHRASE }} | gpg --batch --yes --passphrase-fd 0 --symmetric --cipher-algo AES256 --output "${TFPLAN}.gpg" "${TFPLAN}"

- name: Upload plan
if: steps.terraform-plan.outputs.apply == 'true'
uses: actions/upload-artifact@v3
with:
name: ${{ env.TFPLAN }}
path: ${{ env.WORKDIR }}/${{ env.TFPLAN }}.gpg
if-no-files-found: error

terraform-apply:
needs:
- terraform-plan
runs-on: ubuntu-22.04
timeout-minutes: 60
# This environment requires approval before the deploy is run.
environment: aws-curvenote
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
id-token: write
contents: read
defaults:
run:
working-directory: ${{ env.WORKDIR }}
if: needs.terraform-plan.outputs.apply == 'true'

steps:
- uses: actions/checkout@v3

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ env.AWS_DEPLOYMENT_ROLE }}
aws-region: ${{ env.AWS_REGION }}
role-session-name: terraform-apply

- name: Download plan
uses: actions/download-artifact@v3
with:
name: ${{ env.TFPLAN }}
path: ${{ env.WORKDIR }}

- name: Decrypt plan
run: |
echo ${{ secrets.TFPLAN_ARTIFACT_PASSPHRASE }} | gpg --batch --yes --passphrase-fd 0 --decrypt --cipher-algo AES256 --output "${TFPLAN}" "${TFPLAN}.gpg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use anything other than gpg? age perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using gpg because it's pre-installed in the base environment. This is purely for internal GH workflow use- there's no way to privately pass an artefact between separate jobs, and since this is a public repo it needs to be encrypted. It's not intended to be used by people/maintainers since if we need the full details we can run Terraform anyway, it's just a way to ensure the public can't view any sensitive information in the plan. My plan is to create a random GitHub secret TFPLAN_ARTIFACT_PASSPHRASE and then throw it away so we'll never know the secret.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still love to get rid of it. I basically can't trust myself to ever verify if a GPG commandline is actually doing what it is supposed to do. age can come from apt here, and we can just use it to symmetrically encrypt / decrypt. then:

  1. Make an private age key with age-keygen -o <some-file>. The file is just one line, we put it into a GitHub secret
  2. When we wanna encrypt, echo $KEY > <some-file>; age --identity <some-file> --encrypt --output <encrypted-output-file> <unencrypted-input-file>. We can also pass in input via stdin.
  3. And to decrypt, do it in reverse: echo $KEY > <some-file>; age --identity <some-file> --decrypt --output <unencrypted-output-file> <encrypted-input-file>. The input can come via stdin too.

IMO this is much simpler to understand and verify, and given there's already a lot of complexity here I'd love to not touch GPG at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll see if I can get that to work.


- name: Terraform apply
run: |
terraform init
terraform apply "${TFPLAN}"
33 changes: 33 additions & 0 deletions .github/workflows/terraform.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Terraform static checks

on:
pull_request:
paths:
- "terraform/**"
push:
paths:
- "terraform/**"
workflow_dispatch:

# We can't run CI tests on Terraform, so use as many static linters as possible

jobs:
terraform-pre-commit:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version-file: ".python-version"

- name: Install dependencies
run: pip install pre-commit

# https://github.com/terraform-linters/setup-tflint
- name: Install tflint
uses: terraform-linters/[email protected]
with:
tflint_version: v0.47.0

- name: Run terraform pre-commit
run: pre-commit run --all --config .pre-commit-config-terraform.yaml
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,5 @@ env

.terraform
.terraform.lock.hcl
# Keep .terraform.lock.hcl to ensure reproducible deployments
!terraform/aws/curvenote/.terraform.lock.hcl
22 changes: 22 additions & 0 deletions .pre-commit-config-terraform.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Config reference: https://pre-commit.com/#pre-commit-configyaml---top-level
#
# Common tasks
#
# - Run on all files: pre-commit run --all --config .pre-commit-config-terraform.yaml
#
# Prerequisites:
# - terraform
# - tflint

# Currently only aws/ is checked
files: "^terraform/aws/"
exclude: "^terraform/aws/pangeo/"

repos:
# We can't run any CI tests on production Terraform code, so use as many static linters as possible
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.83.0
hooks:
- id: terraform_fmt
- id: terraform_tflint
- id: terraform_validate
7 changes: 7 additions & 0 deletions terraform/aws/binder-eks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# BinderHub on AWS EKS

This module deploys an AWS EKS cluster with IRSA roles to support BinderHub ECR access.

The module has optional support for using a limited non-administrative AWS role with a permissions boundary to deploy the cluster.

For an example see [curvenote](../curvenote/README.md)
111 changes: 111 additions & 0 deletions terraform/aws/binder-eks/eks-cluster.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/19.15.2
# Full example:
# https://github.com/terraform-aws-modules/terraform-aws-eks/blame/v19.14.0/examples/complete/main.tf
# https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v19.14.0/docs/compute_resources.md

data "aws_caller_identity" "current" {}

locals {
permissions_boundary_arn = (
var.permissions_boundary_name != null ?
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:policy/${var.permissions_boundary_name}" :
null
)
}

# This assumes the EKS service linked role is already created (or the current user has permissions to create it)
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "19.15.3"
cluster_name = var.cluster_name
cluster_version = var.k8s_version
subnet_ids = module.vpc.public_subnets

cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
cluster_endpoint_public_access_cidrs = var.k8s_api_cidrs

vpc_id = module.vpc.vpc_id

# Allow all allowed roles to access the KMS key
kms_key_enable_default_policy = true
# This duplicates the above, but the default is the current user/role so this will avoid
# a deployment change when run by different users/roles
kms_key_administrators = [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
]

enable_irsa = var.enable_irsa
iam_role_permissions_boundary = local.permissions_boundary_arn

eks_managed_node_group_defaults = {
capacity_type = "SPOT"
iam_role_permissions_boundary = local.permissions_boundary_arn
}

eks_managed_node_groups = {
worker_group_1 = {
name = "${var.cluster_name}-wg1"
instance_types = [var.instance_type_wg1]
ami_type = var.use_bottlerocket ? "BOTTLEROCKET_x86_64" : "AL2_x86_64"
platform = var.use_bottlerocket ? "bottlerocket" : "linux"

# additional_userdata = "echo foo bar"
vpc_security_group_ids = [
aws_security_group.all_worker_mgmt.id,
aws_security_group.worker_group_all.id,
]
desired_size = var.wg1_size
min_size = 1
max_size = var.wg1_max_size

# Disk space can't be set with the default custom launch template
# disk_size = 100
block_device_mappings = [
{
# https://github.com/bottlerocket-os/bottlerocket/discussions/2011
device_name = var.use_bottlerocket ? "/dev/xvdb" : "/dev/xvda"
ebs = {
# Uses default alias/aws/ebs key
encrypted = true
volume_size = var.root_volume_size
volume_type = "gp3"
}
}
]

subnet_ids = slice(module.vpc.public_subnets, 0, var.number_azs)
},
# Add more worker groups here
}

manage_aws_auth_configmap = true
# Anyone in the AWS account with sufficient permissions can access the cluster
aws_auth_accounts = [
data.aws_caller_identity.current.account_id,
]
aws_auth_roles = [
{
# GitHub OIDC role
rolearn = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${var.cluster_name}-${var.github_oidc_role_suffix}"
username = "binderhub-github-oidc"
groups = ["system:masters"]
},
{
# GitHub OIDC terraform role
rolearn = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${var.cluster_name}-${var.github_oidc_role_suffix}-terraform"
username = "binderhub-github-oidc"
groups = ["system:masters"]
},
{
# BinderHub admins role
rolearn = aws_iam_role.eks_access.arn
username = "binderhub-admin"
groups = ["system:masters"]
}
]
}

data "aws_eks_cluster_auth" "binderhub" {
name = var.cluster_name
}
Loading