If you are interested in setting up your own Foldy instance, we would love to hear about it and would be happy to help, just send an email to [email protected]
!
Once you are satisfied with the application, you can deploy the application into production by following the procedure below.
Foldy architecture. Kubernetes cluster resources are deployed and managed by Helm. Other resources are deployed and managed manually, instructions below.
This site is built on Kubernetes (specifically Google Kubernetes Engine, GKE). A few Google Cloud resources need to be created, included a GKE project, and then all resources within GKE can be deployed at once. The Kuberenetes config, and its resources, are expressed using a tool called Helm.
Prior to deployment, you must choose the following variables:
-
GOOGLE_PROJECT_ID
: ID for institution google cloud project. Does not need to be foldy specific. Can be retrieved from google cloud console. -
GKE_CLUSTER_NAME
: Name of kubernetes foldy cluster, typically 'foldy' -
GOOGLE_SERVICE_ACCOUNT_ID
: Name of service account that foldy uses, typically 'foldy-sa' -
GOOGLE_SQL_DB_NAME
: Name of SQL database in gke cluster, typically 'foldy-db' -
GOOGLE_SQL_DB_PASSWORD
: SQL database password in gke cluster, for example use the following command to generate a secure password:python -c 'import secrets; print(secrets.token_urlsafe(32))'
-
FOLDY_DOMAIN
: Domain name selected for foldy application -
FOLDY_USER_EMAIL_DOMAIN
: Email domain to allow access, e.g. "lbl.gov" will allow all users with "@lbl.gov" email addresses to access -
GOOGLE_STORAGE_DIRECTORY
: Name of google cloud bucket, for example 'berkeley-foldy-bucket' however it needs to be unique globally like an email address needs to be unique globally -
GOOGLE_ARTIFACT_REPO
: Name of google cloud docker image repository, typically 'foldy-repo' -
GOOGLE_CLOUD_STATIC_IP_NAME
: Name of google cloud static IP resource, typically 'foldy-ip'
These variables will be used throughout this procedure. Once completed, execute the following procedure:
-
Clone this repo
git clone --recurse-submodules https://github.com/JBEI/foldy.git cd foldy
-
Copy the following templates:
cp deployment/helm/values_template.yaml deployment/helm/values.yaml cp deployment/helm/db_creation_resources_template.yaml deployment/helm/db_creation_resources.yaml
-
Choose a domain! We named our instance
LBL foldy
, and reserved the domainfoldy.lbl.gov
with our IT folks, and we think it reads pretty well. If you don't have an IT team who can provision a domain name / record for you, you can reserve an address like ourinstitute-foldy.com using any commercial hostname provider -
Enable cloud logging API for prometheus / metrics
-
Install local tools
gcloud
,helm
,kubectl
, andyq
:- Install Google Cloud CLI [instructions here]
- Install Helm CLI [instructions here], briefly
brew install helm
- Install Kubectl CLI [instructions here]). Briefly, make sure you call
gcloud components install kubectl
andgcloud components install gke-gcloud-auth-plugin
- Install yq [instructions here]. Briefly, on mac, you can call
brew install yq
.
-
Create following google cloud resources
-
Create foldy service account which has scopes/permissions to access necessary foldy resources
- From google cloud console.
- Make sure to provide following roles:
- artifact registry administrator
- artifact registry reader
- cloud sql client
- compute admin
- logging admin
- monitoring admin
- storage admin
- storage object admin
- Fill in service account details in
deployment/helm/values.yaml
-
Create Kubernetes project
gcloud container clusters create $GKE_CLUSTER_NAME --enable-managed-prometheus --region=us-central1-c --workload-pool=$GOOGLE_PROJECT_ID.svc.id.goog
-
Enable kubectl
gcloud container clusters get-credentials $GKE_CLUSTER_NAME
-
Create PostgreSQL DB:
gcloud sql instances create ${GOOGLE_SQL_DB_NAME} --tier=db-f1-micro --region=us-central1 --storage-size=100GB --database-version=POSTGRES_13 --root-password=${GOOGLE_SQL_DB_PASSWORD}
- Then, through the cloud console, enable private IP at
https://console.cloud.google.com/sql/instances/${GOOGLE_SQL_DB_NAME}
, and note the DB IP address asGOOGLE_SQL_DB_PRIVATE_IP
- Now, fill in
DATABASE_URL
indeployment/helm/values.yaml
using following example:postgresql://postgres:${GOOGLE_SQL_DB_PASSWORD}@${GOOGLE_SQL_DB_PRIVATE_IP}/postgres
- Then, through the cloud console, enable private IP at
-
Allocate Static IP Address
- From the Cloud Console, reserve an external static IP address
- Make it IPv4, Regional (us-central1, attached to None)
gcloud compute addresses create ${GOOGLE_CLOUD_STATIC_IP_NAME} --global gcloud compute addresses describe ${GOOGLE_CLOUD_STATIC_IP_NAME} --global
-
OAuth Client ID
- Create OAuth Client ID for production
- Using the Google cloud console.
- Application type: Web Application
- Name:
${GKE_CLUSTER_NAME}-prod
- Authorized javascript origins:
https://${FOLDY_DOMAIN}
- Authorized redirect URIs:
https://${FOLDY_DOMAIN}/api/authorize
- Then paste the ID and secret in the
GOOGLE_CLIENT_{ID,SECRET}
fields indeployment/helm/values.yaml
- Create OAuth Client ID for production
-
Create gcloud bucket using cloud console with following attributes:
- Name =
${GOOGLE_STORAGE_DIRECTORY}
- Multi-region
- Autoclass storage class
- Prevent public access
- No object protection
- Name =
-
Create gcloud docker image repo by running:
gcloud artifacts repositories create ${GOOGLE_ARTIFACT_REPO} --repository-format=docker --location=us-central1
-
Enable permission to push and pull images from artifact registry with:
gcloud auth configure-docker us-central1-docker.pkg.dev
-
Create node pools by running:
bash scripts/create_nodepools.sh
-
-
Fill out template files
- Fill in
SECRET_KEY
indeployment/helm/values.yaml
with random secure string, for example use the following command
python -c 'import secrets; print(secrets.token_urlsafe(32))'
EMAIL_USERNAME
andEMAIL_PASSWORD
indeployment/helm/values.yaml
are optional. They will be used for status notifications, but they must be gmail credentials if specified.- Fill in variables in
deployment/helm/values.yaml
with appropriate values
- Fill in
-
Install the Keda helm/kubernetes plugin with docs
-
Bind service account to GKE
gcloud iam service-accounts add-iam-policy-binding ${GOOGLE_SERVICE_ACCOUNT_ID}@${GOOGLE_PROJECT_ID}.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member "serviceAccount:${GOOGLE_PROJECT_ID}.svc.id.goog[default/foldy-ksa]"
-
Build and push docker images to your google artifact registry with
bash scripts/build_and_deploy_containers.sh
-
Make sure that the
ImageVersion
is properly set indeployment/helm/values.yaml
, then deploy the kubernetes services usinghelm install foldy deployment/helm
-
Initialize tables in PostgreSQL database
kubectl exec service/backend -- env FLASK_APP=main.py flask db upgrade
-
Fill out
db_creation_resources.yaml
with appropriate variables and download alphafold databases into a persistent volume withkubectl apply -f db_creation_resources.yaml
Can monitor progress of database download with
kubectl logs --follow --timestamps --previous create-dbs |less
Note, don't run any jobs until database download has been completed.
-
Reserve a domain name
- Can use this command to find static IP address
gcloud compute addresses describe ${GOOGLE_CLOUD_STATIC_IP_NAME} --global
- You can add an ANAME record pointing at the static IP address provisioned above.
Note, using the us-central1-c
region is required because most google A100s are located in that region.
-
Increment
ImageVersion
indeployment/helm/values.yaml
-
Rebuild the docker images:
scripts/build_and_deploy_containers.sh ${PROJECT_ID} ${GOOGLE_ARTIFACT_REPO} ${IMAGE_VERSION}
-
Update the helm chart
helm upgrade foldy deployment/helm