Merge branch 'dev'

ASFOpenSARlab · Dec 12, 2024 · bb95da5 · bb95da5
2 parents 685df14 + f3c9f2f
commit bb95da5
Show file tree

Hide file tree

Showing 15 changed files with 628 additions and 639 deletions.
diff --git a/docs/dev-guides/about_opensciencelab.md b/docs/dev-guides/about_opensciencelab.md
@@ -0,0 +1,15 @@
+# About OpenScienceLab
+
+OpenScienceLab is about Open Science.
+
+Brought to you by... 
+
+the Alaska Satellite Facility: making remote sensing accessible.
+
+And... 
+
+the OpenScienceLab team.
+
+And... 
+
+by developers like you. Thank you.
diff --git a/docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md b/docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md
@@ -0,0 +1,110 @@
+# Build and Deploy OpenSARLab Cluster
+
+1. Build the docker images first based off `opensarlab-container`.
+
+1. Deploy the following in the same AWS account and region as the previous container images.
+
+1. Create new GitHub repo
+
+    To organize repos, use the naming convention: `deployment-{location/owner}-{maturity?}-cluster`
+
+1. Copy canonical `opensarlab-cluster` and commit.
+
+    Either copy/paste or use `git remote add github https://github.com/ASFOpenSARlab/opensarlab-cluster.git`
+
+    Make sure any hidden files (like .gitignore, .yamllint, etc.) are properly copied.
+
+1. Within AWS add GitHub Connections. If done before, the app should show your GitHub app name.
+
+    https://docs.aws.amazon.com/dtconsole/latest/userguide/connections-create-github.html
+
+    Make sure you are in the right region of your AWS account.
+
+    Once Connections is setup, save the Connection arn for later.
+
+1. Remember to add the current GitHub repo to the Connection app
+
+    GitHub > Settings > GitHub Apps > AWS Connector for GitHub > Repository Access
+
+    Add GitHub repo
+
+1. Add a SSL certificate to AWS Certification Manager. 
+
+    You will need the ARN of the certificate.
+
+1. Update `opensciencelab.yaml` within the code. See explaination of the various parts [here](../opensciencelab_yaml.md). 
+
+1. Deploy the CloudFormation template found at `pipeline/cf-setup-pipeline.yaml`.
+
+    Use the following parameters:
+
+    | Parameter | Description |
+    |-----------|-------------|
+    | Stack name | The CloudFormation stack name. For readablity, append `-pipeline` to the end. |
+    | CodeStarConnectionArn | The ARN of the Connection made eariler. |
+    | CostTagKey | Useful if using billing allocation tags. |
+    | CostTagValue | USeful if using billing allocation tags. Note that many resources will have this in their name for uniqueness. It needs to be short in length. |
+    | GitHubBranchName | The branch name of the GitHub repo where the code resides. |
+    | GitHubFullRepo | The GitHub repo name. Needs to be in the format `{GitHub organization}/{GitHub repo}` from `https://github.com/OrgName/RepoName`. |
+    | | |
+
+    The pipeline will take a few seconds to form.
+
+    If the cloudformation stack fails to fully form it will need to be fully deleted and the template will need to be re-uploaded.
+
+1. The pipeline will start to build automatically in CodePipeline.
+
+    A successful run will take about 12 minutes. 
+
+    If it takes signitifcantly less time then the build might have failed even if CodePipeline says successful.
+
+    Sometimes the final buld stage will error with something like "build role not found". In this case, just retry the stage. There is sometimes a race condtion for AWS role creations.  
+
+    During the course of the build, other CloudFormation stacks will be created. One of these is for the cluster. Within Outputs will be the Load Balancer url which can be used within external DNS. 
+
+1. Add the Portal SSO Token to Secrets Manager.
+
+    Update `sso-token/{region}-{cluster name}`.
+
+1. Add deployment to Portal
+
+    Update `labs.{maturity}.yaml` and re-build Portal.
+
+    Within the Portal Access page, create lab sheet with the `lab_short_name` found in `opensciencelab.yaml`.
+
+    Within the Portal Access page, add usernames and profiles as needed.
+
+1. Add CloudShell access
+
+    From the AWS console, start CloudShell (preferably in it's own browser tab)
+
+    CloudShell copy and paste are not shifted like a normal terminal. They are your normal keyboard operations.
+
+    If needed, update default editor:
+
+    - Append to ~/.bashrc the command `export EDITOR=vim`
+
+    Setup access to the K8s cluster
+
+    - From the AWS EKS page, get the cluster name for below.
+
+    - From the AWS IAM page, get the ARN of the role `{region namwe}-{cluster name}-user-full-access`
+
+    - On the CloudShell terminal, run `aws eks update-kubeconfig --name {EKS clsuter name} --role-arn {role ARN}`
+
+    - Run `kubectl get pods -A`. You should see any user and hub pods. 
+
+1. Bump the AutoScaling Groups
+
+    For reasons unknown, fresh brand-new ASGs need to be "primed" by setting the desired number to one. JupyterHub's autoscaler will scale the groups back down to zero if there is no use. This normally has to only be done once.
+
+1. Start a JupyterLab server to make sure one works
+
+1. Within CloudShell, check the PVC and PV of the user volume. Make sure the K8s annotation `pv.kubernetes.io/provisioned-by: ebs.csi.aws.com` is present.
+
+    If not, then the JupyterHub volume managment will fail and volumes will become orpaned upon lifecycle deletion.
+
+
+## Destroy OpenSARLab Cluster
+
+To take down, consult [destroy deployment docs](../destroy_deployment.md)
diff --git a/docs/dev-guides/cluster/egress_config.md b/docs/dev-guides/cluster/egress_config.md
@@ -0,0 +1,75 @@
+# Egress Configuration
+
+If enabled, the Istio service mesh can apply rules for rate limiting and domain blocking. To facilitate usability, a custom configuration is employed with custom rules. These rules for a particular configuration will only apply to the user or dask pod assigned to the corresponsing egress profile. The configurations need to be found in the {root}/egress_configs directory/useretc. 
+
+## Schema
+
+In general, any parameter starting with `@` is global, `%` is sequential, and `+` is one-time.
+
+Wildcards `*` not allowed.
+
+Comment lines start with `#` and are ignored.
+
+Other line entries:
+
+| Parameter | Value Type | description |
+| --- | --- | ----------- |
+| `@profile` | str | Required. Egress profile name that will be assigned to the lab profile. There can only be one `@profile` per egress config file. Other `@profile` references will be ignored. Because the profile name is part of the naming structure of some k8s resources, it must be fqdn compatible. |
+| `@rate` | int | Required. Rate limit (per 10 seconds) applied to the assigned pod. Value is the max value of requests per second. Any subsequent `@rate` is ignored. To turn off rate limit, set value to `None`.|
+| `@list` | `white` or `black` | Required. Either the config is a whitelist or a blacklist. Any subsequent `@list` is ignored. |
+| `@include` | str | Optional. Any named `.conf` file within a sibling `includes` folder will be copied/inserted at the point of the `@include`. Having `@rate`, `@include`, or `@profile` within the "included" configs will throw an error. Other rules for ordering still apply. |
+| `%port` | int,int | Required. Port value for the host. Must have a value between 1 and 65535. Ports can be consolidated by comma seperation. Ports seperated by `=>` will be treated like a redirect (_this is currently not working. The ports will be treated as seperated by a comma_). |
+|`%timeout` | str | Optional. Timeout for a valid timeout for any subsequent host. The vlaue must end in `s` for seconds, `m` for minutes, etc. |
+|`+ip` | num | Optional. Any valid fqdn ip address.|
+|`^`| str | Optional. Globally negate the hostname value. Useful for disabling included hosts. |
+|||
+
+Lines not prepended with `@`, `%`, `+`, `^`, or `#` will be treated as a hostname.
+
+## Examples
+
+**Blacklist with rate limiting**
+
+``` conf
+# Included blacklist
+%timeout 10s
+%port 80=>443
+
+example.com
+```
+
+``` conf
+# This conf is required!!
+# This will be used by profiles that don't have any explicit whitelist and are not None
+@profile default
+@rate 30
+@list black
+
+@include blacklist
+
+# Note that the explicit redirect is not working properly and should not be used
+# Both port 80 and port 443 will be allowed, though
+%port 80=>443
+
+%timeout 1s
+blackhole.webpagetest.org
+```
+
+**Whitelist with rate limiting**
+
+```conf
+@profile m6a-large-whitelist
+@rate 30
+@list white
+
+@include asf
+@include aws
+@include earthdata
+@include github
+@include local
+@include mappings
+@include mintpy
+@include others
+@include packaging
+@include ubuntu
+```
diff --git a/docs/dev-guides/cluster/opensciencelab_yaml.md b/docs/dev-guides/cluster/opensciencelab_yaml.md
@@ -0,0 +1,97 @@
+# Contents of `opensciencelab.yaml`
+
+Schema for the egress config can be found [../egress_config.md](here).
+
+```yaml
+---
+
+parameters:
+  lab_short_name: The url-friendly short name of the lab deployment.
+  cost_tag_key: Name of the cost allocation tag.
+  cost_tag_value: Value of the cost allocation tag. Also used by cloudformation during setup for naming.
+  admin_user_name: Username of initial JupyterHub admin
+  certificate_arn: AWS arn of the SSL certificate held in Certificate Manager
+  container_namespace: A namespaced path within AWS ECR containing custom images
+  lab_domain: Domain of JupyterHub deployment. Use `load balancer` if not known.
+  portal_domain: Domain of the OSL Portal. Used to communicate with email services, etc.
+
+  # Volume and snapshot lifecycle managament
+  days_till_volume_deletion: The number of integer days after last server use when the user's volume is deleted. To never delete volume, use value 365000.
+  days_after_server_stop_till_warning_email: Comma seperated list of integer days after last server use when user gets warning email. Must have minimum one value. To never send emails, use value 365000
+  days_till_snapshot_deletion: The number of integer days after last server use when the user's snapshot is deleted. To never delete snapshot, use value 365000.
+  days_after_server_stop_till_deletion_email: Number of integer days after last server use when user gets email notifiying about permanent deletion of data. Must have minimum one value. To never send emails, use value 365000
+  utc_hour_of_day_snapshot_cron_runs : Integer hour (UTC) when the daily snapshot cron runs.
+  utc_hour_of_day_volume_cron_runs: Integer hour (UTC) when the daily snapshot cron runs.
+
+  # Versions of sofware installed
+  eks_version: '1.31'  # https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html
+  kubectl_version: '1.31.0/2024-09-12'  # https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html
+  aws_ebs_csi_driver_version: '2.36.0'  # https://github.com/kubernetes-sigs/aws-ebs-csi-driver/releases
+  jupyterhub_helm_version: '3.3.7'  # https://jupyterhub.github.io/helm-chart/
+  jupyterhub_hub_image_version: '4.1.5'  # Match App Version of JupyterHub Helm
+  aws_k8s_cni_version: 'v1.18.5'  # https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html
+  cluster_autoscaler_helm_version: '9.43.1'  # https://github.com/kubernetes/autoscaler/releases > cluster-autoscaler-chart
+  istio_version: '1.23.2'  # https://github.com/istio/istio/releases; set to None if disabling Istio
+  dask_helm_version: '2024.1.0'  # https://helm.dask.org/ > dask-gateway-{version}; Set to None if disabling Dask
+
+nodes:
+  - name: hub  # Required
+    instance: The EC2 instance for the hub node. Type t3a.medium is preferred.
+    min_number: 1  # Required
+    max_number: 1  # Required
+    node_policy: hub  # Required
+    is_hub: True  # Required
+
+  - name: daskcontroller  # Required
+    instance: t3a.medium, t3.medium
+    min_number: 1  # Required
+    max_number: 1  # Required
+    node_policy: dask_controller  # Required
+    is_dask_controller: True  # Required
+    is_spot: True
+
+  - name: Name of node type. Must be alphanumeric (no special characters, whitespace, etc.)
+    instance: The EC2 instance for the hub node. Fallback types seperated by commas. (m6a.xlarge, m5a.xlarge)
+    min_number: Minimum number of running node of this type in the cluster (0)
+    max_number: Maximum number of running node of this type in the cluster (25)
+    node_policy: Node permission policy (user)
+    root_volume_size: Size of the root volume of the EC2 (GiB) (Optional, range 1 - 16,384)
+    is_dask_worker: The EC2 is a dask worker (Optional, True).
+    is_spot: The EC2 is part of a spot fleet (Optional, True).  
+
+# Service accounts allow a built-in way to interact with AWS resources from within a server. 
+# However, the default AWS profile is overwritten and may have inintended consequences.
+service_accounts:
+  - name: service_account_name
+    namespace: namespace of k8s resource (jupyter)
+    permissions:
+      - Effect: "Allow"
+        Action: 
+          - "AWS Resource Action"
+        Resource: "AWS Resource ARN"
+
+dask_profiles:
+  - name: Name of dask profile that the user can select (Example 1)
+    short_name: example_1
+    description: "Basic worker used by example notebook"
+    image_url: FQDN with docker tags (233535791844.dkr.ecr.us-west-2.amazonaws.com/smce-test-opensarlab/daskworker:180a826). If not public, the domain must be in the same AWS account as the cluster.
+    node_name: Node must be defined as a dask worker.
+    egress_profile: Name of the egress config to use. Do not include `.conf` suffix (Optional)
+
+lab_profiles:
+  - name: Name of profile that users can select (SAR 1)
+    description: Description of profile
+    image_url: FQDN of JupyterLab single user image with docker tags ( 233535791844.dkr.ecr.us-west-2.amazonaws.com/smce-test-opensarlab/sar:ea3e147). If not public, the domain must be in the same AWS account as the cluster.
+    hook_script: Name of the script ran on user server startup (sar.sh) (optional)
+    memory_guarantee: RAM usage guaranteed per user (6G) (Optional. Defaults to 0% RAM.)
+    memory_limit: RAM usage guaranteed per user (16G) (Optional. Defaults to 100% RAM of server.)
+    cpu_guarantee: CPU usage guaranteed per user (15) (Optional. Defaults to 0% CPU. Memory limits are preferable.)
+    cpu_limit: CPU usage limit per user (30) (Optional. Defaults to 100% CPU of server. Memory limits are preferable.)
+    storage_capacity: Size of each user's home directory (500Gi). Cannot be reduced after allocation.
+    node_name: Node name as given in above section (sar1)
+    delete_user_volumes: If True, deletes user volumes upon server stopping (Optional. Defaults to False.)
+    desktop: If True, use Virtual Desktop by default (Optional. Defaults to False) The desktop enviromnent must be installed on image. 
+    default: If True, the specific profile is selected by default (Optional. False if not explicity set.)
+    service_account: Name of previously defined service account to apply to profile (Optional)
+    egress_profile: Name of the egress config to use. Do not include `.conf` suffix (Optional)
+```
diff --git a/docs/dev-guides/container/build_and_deploy_opensarlab_image.md b/docs/dev-guides/container/build_and_deploy_opensarlab_image.md
@@ -0,0 +1,59 @@
+# Build and Deploy OpenSARLab Image Container
+
+## Setup Container Build in AWS
+
+1. Create AWS account if needed
+
+1. Gain GitHUb access if needed
+
+1. Create new GitHub repo
+
+    To organize repos, use the naming convention: `deployment-{location/owner}-{maturity?}-container`
+
+1. Copy canonical `opensarlab-container` and commit
+
+    Either copy/paste or use `git remote add github https://github.com/ASFOpenSARlab/opensarlab-container.git`
+
+1. Within AWS add GitHub Connections. If done before, the app should show your GitHub app name.
+
+    https://docs.aws.amazon.com/dtconsole/latest/userguide/connections-create-github.html
+
+    Make sure you are in the right region of your AWS account.
+
+    Once Connections is setup, save the Connection arn for later.
+
+1. Remember to add the current GitHub repo to the Connection app
+
+    GitHub > Settings > GitHub Apps > AWS Connector for GitHub > Repository Access
+
+    Add GitHub repo
+
+1. Within AWS CloudFormation, upload the template file `cf-container.yaml` and build.
+
+    When prompted, use the Parameters:
+
+    | Parameter | Description |
+    |-----------|-------------|
+    | Stack name | The CloudFormation stack name. For readablity, append `-pipeline` to the end. |
+    | CodeStarConnectionArn | The ARN of the Connection made eariler. |
+    | ContainerNamespace | The ECR prefix acting as a namespace for the images. This will be needed for the cluster's `opensarlab.yaml`. |
+    | CostTagKey | Useful if using billing allocation tags. |
+    | CostTagValue | USeful if using billing allocation tags. Note that many resources will have this in their name for uniqueness. It needs to be short in length. |
+    | GitHubBranchName | The branch name of the GitHub repo where the code resides. |
+    | GitHubFullRepo | The GitHub repo name. Needs to be in the format `{GitHub organization}/{GitHub repo}` from `https://github.com/OrgName/RepoName`. |
+    | | |
+
+    The pipeline will take a few seconds to form.
+
+    If the cloudformation stack fails to fully form it will need to be fully deleted and the template will need to be re-uploaded.
+
+1. The pipeline will start to build automatically in CodePipeline.
+
+    A successful run will take about 20 minutes. 
+
+    If it takes signitifcantly less time then the build might have failed even if CodePipeline says successful.
+
+
+## Destroy OpenSARLab Image Container
+
+To take down, consult [destroy deployment docs](../destroy_deployment.md)
diff --git a/docs/dev-guides/conda_environments.md → ...ev-guides/container/conda_environments.md b/docs/dev-guides/conda_environments.md → ...ev-guides/container/conda_environments.md
@@ -1,4 +1,4 @@
-[Return to Developer Guide](../dev.md)
+[Return to Developer Guide](../../dev.md)
 
 # There are a few options for creating conda environments in OpenSARLab. 
 Each option come with benefits and drawbacks.

diff --git a/docs/dev-guides/mintpy_conda.md → docs/dev-guides/container/mintpy_conda.md b/docs/dev-guides/mintpy_conda.md → docs/dev-guides/container/mintpy_conda.md