From 7c1717d25cc1348eb424a0f2857c18b628d6ea75 Mon Sep 17 00:00:00 2001 From: Eric Lundell Date: Tue, 19 Nov 2024 18:28:47 -0900 Subject: [PATCH 1/7] (Very much a WIP) Update dev guide --- docs/dev-guides/about_opensciencelab.md | 15 ++ .../build_and_deploy_opensarlab_cluster.md | 105 ++++++++ .../build_and_deploy_opensarlab_image.md | 54 ++++ docs/dev-guides/build_and_deploy_portal.md | 238 ++++++++++++++++++ docs/dev-guides/egress_config.md | 67 +++++ docs/dev-guides/opensciencelab_yaml.md | 65 +++++ docs/dev.md | 6 +- 7 files changed, 549 insertions(+), 1 deletion(-) create mode 100644 docs/dev-guides/about_opensciencelab.md create mode 100644 docs/dev-guides/build_and_deploy_opensarlab_cluster.md create mode 100644 docs/dev-guides/build_and_deploy_opensarlab_image.md create mode 100644 docs/dev-guides/build_and_deploy_portal.md create mode 100644 docs/dev-guides/egress_config.md create mode 100644 docs/dev-guides/opensciencelab_yaml.md diff --git a/docs/dev-guides/about_opensciencelab.md b/docs/dev-guides/about_opensciencelab.md new file mode 100644 index 0000000..e2f4c63 --- /dev/null +++ b/docs/dev-guides/about_opensciencelab.md @@ -0,0 +1,15 @@ +# About OpenScienceLab + +OpenScienceLab is about Open Science. + +Brought to you by... + +the Alaska Satellite Facility: making remote sensing accessible. + +And... + +the OpenScienceLab team. + +And... + +by developers like you. Thank you. diff --git a/docs/dev-guides/build_and_deploy_opensarlab_cluster.md b/docs/dev-guides/build_and_deploy_opensarlab_cluster.md new file mode 100644 index 0000000..00393e3 --- /dev/null +++ b/docs/dev-guides/build_and_deploy_opensarlab_cluster.md @@ -0,0 +1,105 @@ +# Build and Deploy OpenSARLab Cluster + +1. Build the docker images first based off `opensarlab-container`. + +1. Deploy the following in the same AWS account and region as the previous container images. + +1. Create new GitHub repo + + To organize repos, use the naming convention: `deployment-{location/owner}-{maturity?}-cluster` + +1. Copy canonical `opensarlab-cluster` and commit. + + Either copy/paste or use `git remote add github https://github.com/ASFOpenSARlab/opensarlab-cluster.git` + + Make sure any hidden files (like .gitignore, .yamllint, etc.) are properly copied. + +1. Within AWS add GitHub Connections. If done before, the app should show your GitHub app name. + + https://docs.aws.amazon.com/dtconsole/latest/userguide/connections-create-github.html + + Make sure you are in the right region of your AWS account. + + Once Connections is setup, save the Connection arn for later. + +1. Remember to add the current GitHub repo to the Connection app + + GitHub > Settings > GitHub Apps > AWS Connector for GitHub > Repository Access + + Add GitHub repo + +1. Add a SSL certificate to AWS Certification Manager. + + You will need the ARN of the certificate. + +1. Update `opensciencelab.yaml` within the code. See explaination of the various parts [../opensciencelab_yaml.md](here). + +1. Deploy the CloudFormation template found at `pipeline/cf-setup-pipeline.yaml`. + + Use the following parameters: + + | Parameter | Description | + |-----------|-------------| + | Stack name | The CloudFormation stack name. For readablity, append `-pipeline` to the end. | + | CodeStarConnectionArn | The ARN of the Connection made eariler. | + | CostTagKey | Useful if using billing allocation tags. | + | CostTagValue | USeful if using billing allocation tags. Note that many resources will have this in their name for uniqueness. It needs to be short in length. | + | GitHubBranchName | The branch name of the GitHub repo where the code resides. | + | GitHubFullRepo | The GitHub repo name. Needs to be in the format `{GitHub organization}/{GitHub repo}` from `https://github.com/OrgName/RepoName`. | + | | | + + The pipeline will take a few seconds to form. + + If the cloudformation stack fails to fully form it will need to be fully deleted and the template will need to be re-uploaded. + +1. The pipeline will start to build automatically in CodePipeline. + + A successful run will take about 12 minutes. + + If it takes signitifcantly less time then the build might have failed even if CodePipeline says successful. + + Sometimes the final buld stage will error with something like "build role not found". In this case, just retry the stage. There is sometimes a race condtion for AWS role creations. + + During the course of the build, other CloudFormation stacks will be created. One of these is for the cluster. Within Outputs will be the Load Balancer url which can be used within external DNS. + +1. Add the Portal SSO Token to Secrets Manager. + + Update `sso-token/{region}-{cluster name}`. + +1. Add deployment to Portal + + Update `labs.{maturity}.yaml` and re-build Portal. + + Within the Portal Access page, create lab sheet with the `lab_short_name` found in `opensciencelab.yaml`. + + Within the Portal Access page, add usernames and profiles as needed. + +1. Add CloudShell access + + From the AWS console, start CloudShell (preferably in it's own browser tab) + + CloudShell copy and paste are not shifted like a normal terminal. They are your normal keyboard operations. + + If needed, update default editor: + + - Append to ~/.bashrc the command `export EDITOR=vim` + + Setup access to the K8s cluster + + - From the AWS EKS page, get the cluster name for below. + + - From the AWS IAM page, get the ARN of the role `{region namwe}-{cluster name}-user-full-access` + + - On the CloudShell terminal, run `aws eks update-kubeconfig --name {EKS clsuter name} --role-arn {role ARN}` + + - Run `kubectl get pods -A`. You should see any user and hub pods. + +1. Bump the AutoScaling Groups + + For reasons unknown, fresh brand-new ASGs need to be "primed" by setting the desired number to one. JupyterHub's autoscaler will scale the groups back down to zero if there is no use. This normally has to only be done once. + +1. Start a JupyterLab server to make sure one works + +1. Within CloudShell, check the PVC and PV of the user volume. Make sure the K8s annotation `pv.kubernetes.io/provisioned-by: ebs.csi.aws.com` is present. + + If not, then the JupyterHub volume managment will fail and volumes will become orpaned upon lifecycle deletion. diff --git a/docs/dev-guides/build_and_deploy_opensarlab_image.md b/docs/dev-guides/build_and_deploy_opensarlab_image.md new file mode 100644 index 0000000..ceabc74 --- /dev/null +++ b/docs/dev-guides/build_and_deploy_opensarlab_image.md @@ -0,0 +1,54 @@ +# Build and Deploy OpenSARLab Image + +## Setup Container Build in AWS + +1. Create AWS account if needed + +1. Gain GitHUb access if needed + +1. Create new GitHub repo + + To organize repos, use the naming convention: `deployment-{location/owner}-{maturity?}-container` + +1. Copy canonical `opensarlab-container` and commit + + Either copy/paste or use `git remote add github https://github.com/ASFOpenSARlab/opensarlab-container.git` + +1. Within AWS add GitHub Connections. If done before, the app should show your GitHub app name. + + https://docs.aws.amazon.com/dtconsole/latest/userguide/connections-create-github.html + + Make sure you are in the right region of your AWS account. + + Once Connections is setup, save the Connection arn for later. + +1. Remember to add the current GitHub repo to the Connection app + + GitHub > Settings > GitHub Apps > AWS Connector for GitHub > Repository Access + + Add GitHub repo + +1. Within AWS CloudFormation, upload the template file `cf-container.yaml` and build. + + When prompted, use the Parameters: + + | Parameter | Description | + |-----------|-------------| + | Stack name | The CloudFormation stack name. For readablity, append `-pipeline` to the end. | + | CodeStarConnectionArn | The ARN of the Connection made eariler. | + | ContainerNamespace | The ECR prefix acting as a namespace for the images. This will be needed for the cluster's `opensarlab.yaml`. | + | CostTagKey | Useful if using billing allocation tags. | + | CostTagValue | USeful if using billing allocation tags. Note that many resources will have this in their name for uniqueness. It needs to be short in length. | + | GitHubBranchName | The branch name of the GitHub repo where the code resides. | + | GitHubFullRepo | The GitHub repo name. Needs to be in the format `{GitHub organization}/{GitHub repo}` from `https://github.com/OrgName/RepoName`. | + | | | + + The pipeline will take a few seconds to form. + + If the cloudformation stack fails to fully form it will need to be fully deleted and the template will need to be re-uploaded. + +1. The pipeline will start to build automatically in CodePipeline. + + A successful run will take about 20 minutes. + + If it takes signitifcantly less time then the build might have failed even if CodePipeline says successful. diff --git a/docs/dev-guides/build_and_deploy_portal.md b/docs/dev-guides/build_and_deploy_portal.md new file mode 100644 index 0000000..cb5699a --- /dev/null +++ b/docs/dev-guides/build_and_deploy_portal.md @@ -0,0 +1,238 @@ +# Build and Deploy the Portal + + +# Enable Under Construction page + +Sometimes the Portal must be taken down for updates. For instance, the EC2 the Portal runs on needs to be respawned for updates. + +To help facilitate communication with users, an Under Construction page can be enabled. All traffic to the Portal will be redirected to this page. + +1. To enable the page, log into the AWS account and go to EC2 console. + +1. Go to the Portal load balancer, select the `HTTPS:443` listener, and _check_ the Default rule. + +1. In the dropdown Actions menu, select Edit Rule. + +1. Set the instance target group weight to **0**. Set the lambda target group weight to **1**. + +1. At the bottom of the page Save Changes. + +1. Changes should take affect almost immediately. + +To revert changes after updating, repeat the above steps except change the target group weights so that the instance gets **1** and the lambda gets **0**. + + +# ---------- + +The following documentation is older and must be used with caution. + +# Prerequsites + +1. AWS SES: Store SES secrets + +These secrets will be used to communicate with SES to send emails. "SMTP credentials consist of a username and a password. When you click the Create button below, SMTP credentials will be generated for you." The credentials are AWS access keys, like as used in local aws configs. They are valid for the whole region. https://us-west-2.console.aws.amazon.com/ses/home + +- Create a verified email and take out of sandbox. + +- Create SES serets + +Go to `Account Dashboard`. + `Create SMTP credentials`. The IAM User Name should be unique and easy to find within IAM. On user creation, SMTP credentials will be created. + +- Store SES secrets + +https://us-west-2.console.aws.amazon.com/secretsmanager/home + +Click on `Store New Secret` +`Other type of secret` +`Plaintext` +Delete all empty json content. +Add username and password as given previously in the following format: `USERNAME PASSWORD`. +Click `Next` +Secret Name: `portal/ses-creds` +Tags: `osl-billing: osl-portal` +Click `Next` +Click `Next` +Click `Store` + + +1. AWS Secrets Manager: Create SSO token + +This token will be used by the labs to communicate and authenticate with the portal. All labs and the portal share this token. It is imperative that this remains secret. The form of the token is very specific. Use the following process to create the token. + +- Create secret + +```bash +pip install cryptography +python3 +``` + +```python3 +from cryptography.fernet import Fernet + +api_token = Fernet.generate_key() +api_token +``` + +- Add to AWS Secret Manager + +https://us-west-2.console.aws.amazon.com/secretsmanager/home + +Click on `Store New Secret` +`Other type of secret` +`Plaintext` +Delete all empty json content. +Add value of _api_token_ +Click `Next` +Secret Name: `$CONTAINER_NAMESPACE/sso-token` +Tags: `osl-billing: osl-portal` +Click `Next` +Click `Next` +Click `Store` + + +1. Docker registry + +For local dev development, one can use a local docker registry. +`docker run -d -p 5000:5000 --restart=always --name registry registry:2` + +Otherwise, the remote docker images will be stored in AWS ECR, as setup by CloudFormation + +1. Docker repo + +Clone the portal code. +If production, push to CodeCommit the portal code. + + +# Setup + +If production, upload the Cloudformation template `cf-portal-setup.yaml` and build. + +Once the cloudformation is done, go to EC2 Connect, log onto the server and `cd /home/ec2-user/code`. +Then setup prerequisites via `make setup-ec2`. +Note that you will be warned about reformatting the DB volume. If this is the first time running (as it should be), do so. + +If locally, go to the root up the docker repo. +The setup prerequisites via `make setup-ubuntu`. + + +# Build + +`cp labs.example.yaml labs.maturity.yaml`. The name of the config doesn't matter (except it cannot be labs.run.yaml) +Update labs.maturity.yaml as needed + +`make config=labs.maturity.yaml` + + +# Destroy + +If production, clear out the registry images, delete the CloudFormation setup, delete snapshots, and delete logs. + +If locally, `make clean` and then stop the localhost registry (if being used). + +# Other less used procedures + +1. Logs + +In production, normally the logs will show up in CloudWatch. + +For both, `docker compose logs -f`. + +1. Replace Portal DB from snapshot + +If the Portal DB needs to be replaced by a snapshot backup, do the following. + +All of these steps take place within EC2 Connect. + +Elevated permissions will be needed via `sudo` or `sudo su -`. + +- Restore snapshot to volume + +This procedure assumes that the usual DB volume is present and being used. We only want to update the DB file. + +Within `cf-portal-setup.yaml`, the AZ of the EC2's subnet is set as us-west-2a. + +From the EC2 Connect console, select the snapshot that will be restored. Get the SNAPSHOT_ID, e.g. snap-0c0dbee2e7c9f0c12 + +From the EC2 Connect console, select the portal EC2. Get the EC2_INSTANCE_ID, e.g. i-0ca96843e97d9bd29 + +First run a dry run to make sure permissions are available. +``` +aws ec2 create-volume \ + --dry-run \ + --availability-zone us-west-2a \ + --snapshot-id $SNAPSHOT_ID \ + --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=portal-db-backup}]' +``` + +Then actually run it. +``` +aws ec2 create-volume \ + --availability-zone us-west-2a \ + --snapshot-id $SNAPSHOT_ID \ + --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=portal-db-backup}]' +``` + +From response output, get VOLUME_ID, e.g. vol-0a0869b5ab9f77090 + +- Attach volume to EC2 + +First run a dry run to make sure permissions are available. +``` +aws ec2 attach-volume \ + --dry-run \ + --device /dev/sdm \ + --instance-id $EC2_INSTANCE_ID \ + --volume-id $VOLUME_ID +``` + +Then actually run it. +``` +aws ec2 attach-volume \ + --device /dev/sdm \ + --instance-id $EC2_INSTANCE_ID \ + --volume-id $VOLUME_ID +``` + +- Mount device to filesystem + +`sudo mkdir -p /tmp/portal-db-from-snapshot/` +`sudo mount /dev/sdm /tmp/portal-db-from-snapshot/` + +If you get an error message something like + +> /wrong fs type, bad option, bad superblock + +then you cannot mount the filesystem. AWS's way of handling volumes makes things difficult. https://serverfault.com/questions/948408/mount-wrong-fs-type-bad-option-bad-superblock-on-dev-xvdf1-missing-codepage + +Since we are working with a temporay mount, run the following instead: + +`sudo mount -t xfs -o nouuid /dev/sdm /tmp/portal-db-from-snapshot/`. + +- Check for mounted directories + +`df` + +Look for something like + +``` +/dev/xvdj 1038336 34224 1004112 4% /srv/jupyterhub +/dev/xvdm 1038336 34224 1004112 4% /tmp/portal-db-from-snapshot +``` + +- Create a backup of the old DB file + +`sudo cp ./srv/portal/jupyterhub/jupyterhub.sqlite ./srv/portal/jupyterhub/jupyterhub.sqlite.$(date +"%F-%H-%M-%S")` + +- Copy over DB file + +`sudo cp /tmp/portal-db-from-snapshot/jupyterhub.sqlite ./srv/portal/jupyterhub/jupyterhub.sqlite` + +- Unmount and detach volume from EC2 + +`sudo umount /tmp/portal-db-from-snapshot/` +`sudo aws ec2 detach-volume --volume-id VOLUME_ID` + +- Delete volume + +`sudo aws ec2 delete-volume --volume-id VOLUME_ID` diff --git a/docs/dev-guides/egress_config.md b/docs/dev-guides/egress_config.md new file mode 100644 index 0000000..ef76c6d --- /dev/null +++ b/docs/dev-guides/egress_config.md @@ -0,0 +1,67 @@ +# Egress Configuration + +If enabled, the Istio service mesh can apply rules for rate limiting and domain blocking. These rules for a particular configuration will only apply to the user or dask pod assigned to the corresponsing egress profile. Thbe configurations need to be found in the root/egress_configs directory. + +## Schema + +In general, any parameter starting with `@` is global, `%` is sequential, and `+` is one-time. + +Wildcards `*` not allowed. + +Comment lines start with `#` and are ignored. + +Other line entries: + +| Parameter | Value Type | description | +| --- | --- | ----------- | +| `@profile` | str | Required. Egress profile name that will be assigned to the lab profile. There can only be one `@profile` per egress config file. Other `@profile` references will be ignored. Because the profile name is part of the naming structure of some k8s resources, it must be fqdn compatible. | +| `@rate` | int | Required. Rate limit (per 10 seconds) applied to the assigned pod. Value is the max value of requests per second. Any subsequent `@rate` is ignored. To turn off rate limit, set value to `None`.| +| `@list` | `white` or `black` | Required. Either the config is a whitelist or a blacklist. Any subsequent `@list` is ignored. | +| `@include` | str | Optional. Any named `.conf` file within a sibling `includes` folder will be copied/inserted at the point of the `@include`. Having `@rate`, `@include`, or `@profile` within the "included" configs will throw and error. Other rules for ordering still apply. | +| `%port` | int,int | Required. Port value for the host. Must have a value between 1 and 65535. Ports can be consolidated by comma seperation. Ports seperated by `=>` will be treated like a redirect (_this is currently not working. The ports will be treated as seperated by a comma_). | +|`%timeout` | str | Optional. Timeout for a valid timeout for any subsequent host. The vlaue must end in `s` for seconds, `m` for minutes, etc. | +|`+ip` | num | Optional. Any valid fqdn ip address.| +|`^`| str | Optional. Globally negate the hostname value. Useful for disabling included hosts. | +||| + +Lines not prepended with `@`, `%`, `+`, `^`, or `#` will be treated as a hostname. + +## Examples + +**Blacklist with rate limiting** + +``` conf +# This conf is required!! +# This will be used by profiles that don't have any explicit whitelist and are not None +@profile default +@rate 30 +@list black + +@include blacklist + +# Note that the explicit redirect is not working properly and should not be used +# Both port 80 and port 443 will be allowed, though +%port 80=>443 + +%timeout 1s +blackhole.webpagetest.org +``` + +**Whitelist with rate limiting** + +```conf +@profile m6a-large-whitelist +@rate 30 +@list white + +@include asf +@include aws +@include earthdata +@include github +@include local +@include mappings +@include mintpy +@include others +@include packaging +@include ubuntu +``` diff --git a/docs/dev-guides/opensciencelab_yaml.md b/docs/dev-guides/opensciencelab_yaml.md new file mode 100644 index 0000000..01c5ca0 --- /dev/null +++ b/docs/dev-guides/opensciencelab_yaml.md @@ -0,0 +1,65 @@ +# Contents of `opensciencelab.yaml` + +Schema for the egress config can be found [../egress_config.md](here). + +```yaml +--- + +parameters: + lab_short_name: The url-friendly short name of the lab deployment. + cost_tag_key: Name of the cost allocation tag + cost_tag_value: Value of the cost allocation tag + admin_user_name: Username of initial JupyterHub admin + certificate_arn: AWS arn of the SSL certificate held in Certificate Manager + container_namespace: A namespaced path within AWS ECR containing custom images + lab_domain: Domain of JupyterHub deployment. Use `load balancer` if not known. + portal_domain: Domain of the OSL Portal. Used to communicate with email services, etc. + days_till_volume_deletion: The number of integer days after last server use when the user's volume is deleted. To never delete volume, use value 365000. + days_till_snapshot_deletion: The number of integer days after last server use when the user's snapshot is deleted. To never delete snapshot, use value 365000. + days_after_server_stop_till_warning_email: Comma seperated list of integer days after last server use when user gets warning email. Must have minimum one value. To never send emails, use value 365000 + days_after_server_stop_till_deletion_email: Number of integer days after last server use when user gets email notifiying about permanent deletion of data. Must have minimum one value. To never send emails, use value 365000 + utc_hour_of_day_snapshot_cron_runs : Integer hour (UTC) when the daily snapshot cron runs. + utc_hour_of_day_volume_cron_runs: Integer hour (UTC) when the daily snapshot cron runs. + +nodes: + - name: hub # Required + instance: The EC2 instance for the hub node. Type t3a.medium is preferred. + min_number: 1 # Required + max_number: 1 # Required + node_policy: hub # Required + is_hub: True # Required + + - name: Name of node type. Must be alphanumeric (no special characters, whitespace, etc.) + instance: The EC2 instance for the hub node (m5a.2xlarge) + min_number: Minimum number of running node of this type in the cluster (0) + max_number: Maximum number of running node of this type in the cluster (25) + node_policy: Node permission policy (user) + root_volume_size: Size of the root volume of the EC2 (GiB) (Optional, range 1 - 16,384) + +service_accounts: + - name: service_account_name + namespace: namespace of k8s resource (jupyter) + permissions: + - Effect: "Allow" + Action: + - "AWS Resource Action" + Resource: "AWS Resource ARN" + +profiles: + - name: Name of profile that users can select (SAR 1) + description: Description of profile + image_name: Name of JupyterHub single user image found in ECR (sar1) + image_tag: Tag of JupyterHub single user image found in ECR (b1f4e84) + hook_script: Name of the script ran on user server startup (sar.sh) (optional) + memory_guarantee: RAM usage guaranteed per user (6G) (Optional. Defaults to 0% RAM.) + memory_limit: RAM usage guaranteed per user (16G) (Optional. Defaults to 100% RAM of server.) + cpu_guarantee: CPU usage guaranteed per user (15) (Optional. Defaults to 0% CPU.) + cpu_limit: CPU usage limit per user (30) (Optional. Defaults to 100% CPU of server.) + storage_capacity: Size of each user's home directory (500Gi). Cannot be reduced after allocation. + node_name: Node name as given in above section (sar1) + delete_user_volumes: If True, deletes user volumes upon server stopping (Optional. Defaults to False.) + classic: If True, use Classic Notebook interface (Optional. Defaults to False, i.e. JupyterLab.) + default: If True, the specific profile is selected by default (Optional. False if not explicity set.) + service_account: Name of previously defined service account to apply to profile (Optional) + egress_profile: Name of the egress config to use. Do not include `.conf` suffix (Optional) +``` \ No newline at end of file diff --git a/docs/dev.md b/docs/dev.md index 3c599b3..79d2fae 100644 --- a/docs/dev.md +++ b/docs/dev.md @@ -1,5 +1,9 @@ +1. [OpenScienceLab](dev-guides/about_opensciencelab.md) +1. [Build and Deploy the Portal](dev_guide/build_and_deploy_portal.md) +1. [Build and Deploy OpenSARLab Image](dev_guide/build_and_deploy_opensarlab_image.md) +1. [Build and Deploy OpenSARLab Cluster](dev_guide/build_and_deploy_opensarlab_cluster.md) +1. [(DEPEC) Deploy OpenSARLab to AWS](dev-guides/deploy_OpenSARLab.md) 1. [System Diagram](assets/system_diagrams/OpenSARLab_system_diagram_June_2021.png) -1. [Deploy OpenSARLab to AWS](dev-guides/deploy_OpenSARLab.md) 1. [Conda Environment Options](dev-guides/conda_environments.md) 1. [OpenSARLab Notifications](dev-guides/notifications.md) 1. [Troubleshooting](dev-guides/troubleshooting.md) From c8d9a0e2282941db63b811dd43f5eaa2ad2526b1 Mon Sep 17 00:00:00 2001 From: Eric Lundell Date: Tue, 19 Nov 2024 18:34:30 -0900 Subject: [PATCH 2/7] Update outer menu as well --- mkdocs.yml | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/mkdocs.yml b/mkdocs.yml index ec92646..828fdb1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -36,12 +36,15 @@ nav: - Best Practices for Writing Notebooks: user-guides/class_notebooks_best_practices.md - Logging Out and Server Shutdown: user-guides/logging_out_and_server_shutdown.md - Developer Guide: + - OpenScienceLab: dev-guides/about_opensciencelab.md + - Build and Deploy the Portal: dev_guide/build_and_deploy_portal.md + - Build and Deploy OpenSARLab Image: dev_guide/build_and_deploy_opensarlab_image.md + - Build and Deploy OpenSARLab Cluster: dev_guide/build_and_deploy_opensarlab_cluster.md + - (DEPEC) Deploy OpenSARLab to AWS: dev-guides/deploy_OpenSARLab.md - System Diagram: assets/system_diagrams/OpenSARLab_system_diagram_June_2021.png - - Deploy OpenSARLab to AWS: dev-guides/deploy_OpenSARLab.md - - Destroy Deployments: dev-guides/destroy_deployment.md - Conda Environment Options: dev-guides/conda_environments.md - - Notifications: dev-guides/notifications.md - - Troubelshooting Guide: dev-guides/troubleshooting.md + - OpenSARLab Notifications: dev-guides/notifications.md + - Troubleshooting: dev-guides/troubleshooting.md - Custom Mintpy Conda Build Instructions: dev-guides/mintpy_conda.md - Release Notes: - June 2021: release-notes/release_06-2021.md From f7ddbf57dd3a8704c985a1e7fc278cb4339fd779 Mon Sep 17 00:00:00 2001 From: Eric Lundell Date: Tue, 19 Nov 2024 18:39:55 -0900 Subject: [PATCH 3/7] dev_guide`s` - not dev_guide --- docs/dev.md | 6 +++--- mkdocs.yml | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/dev.md b/docs/dev.md index 79d2fae..34443ef 100644 --- a/docs/dev.md +++ b/docs/dev.md @@ -1,7 +1,7 @@ 1. [OpenScienceLab](dev-guides/about_opensciencelab.md) -1. [Build and Deploy the Portal](dev_guide/build_and_deploy_portal.md) -1. [Build and Deploy OpenSARLab Image](dev_guide/build_and_deploy_opensarlab_image.md) -1. [Build and Deploy OpenSARLab Cluster](dev_guide/build_and_deploy_opensarlab_cluster.md) +1. [Build and Deploy the Portal](dev_guides/build_and_deploy_portal.md) +1. [Build and Deploy OpenSARLab Image](dev_guides/build_and_deploy_opensarlab_image.md) +1. [Build and Deploy OpenSARLab Cluster](dev_guides/build_and_deploy_opensarlab_cluster.md) 1. [(DEPEC) Deploy OpenSARLab to AWS](dev-guides/deploy_OpenSARLab.md) 1. [System Diagram](assets/system_diagrams/OpenSARLab_system_diagram_June_2021.png) 1. [Conda Environment Options](dev-guides/conda_environments.md) diff --git a/mkdocs.yml b/mkdocs.yml index 828fdb1..a5dedca 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -37,9 +37,9 @@ nav: - Logging Out and Server Shutdown: user-guides/logging_out_and_server_shutdown.md - Developer Guide: - OpenScienceLab: dev-guides/about_opensciencelab.md - - Build and Deploy the Portal: dev_guide/build_and_deploy_portal.md - - Build and Deploy OpenSARLab Image: dev_guide/build_and_deploy_opensarlab_image.md - - Build and Deploy OpenSARLab Cluster: dev_guide/build_and_deploy_opensarlab_cluster.md + - Build and Deploy the Portal: dev_guides/build_and_deploy_portal.md + - Build and Deploy OpenSARLab Image: dev_guides/build_and_deploy_opensarlab_image.md + - Build and Deploy OpenSARLab Cluster: dev_guides/build_and_deploy_opensarlab_cluster.md - (DEPEC) Deploy OpenSARLab to AWS: dev-guides/deploy_OpenSARLab.md - System Diagram: assets/system_diagrams/OpenSARLab_system_diagram_June_2021.png - Conda Environment Options: dev-guides/conda_environments.md From 2cfd5558a09bb78cd5ce90bcd86fc2354e87d049 Mon Sep 17 00:00:00 2001 From: Eric Lundell Date: Tue, 19 Nov 2024 18:43:36 -0900 Subject: [PATCH 4/7] dev`-`guides - not dev`_`guides --- docs/dev.md | 6 +++--- mkdocs.yml | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/dev.md b/docs/dev.md index 34443ef..5540148 100644 --- a/docs/dev.md +++ b/docs/dev.md @@ -1,7 +1,7 @@ 1. [OpenScienceLab](dev-guides/about_opensciencelab.md) -1. [Build and Deploy the Portal](dev_guides/build_and_deploy_portal.md) -1. [Build and Deploy OpenSARLab Image](dev_guides/build_and_deploy_opensarlab_image.md) -1. [Build and Deploy OpenSARLab Cluster](dev_guides/build_and_deploy_opensarlab_cluster.md) +1. [Build and Deploy the Portal](dev-guides/build_and_deploy_portal.md) +1. [Build and Deploy OpenSARLab Image](dev-guides/build_and_deploy_opensarlab_image.md) +1. [Build and Deploy OpenSARLab Cluster](dev-guides/build_and_deploy_opensarlab_cluster.md) 1. [(DEPEC) Deploy OpenSARLab to AWS](dev-guides/deploy_OpenSARLab.md) 1. [System Diagram](assets/system_diagrams/OpenSARLab_system_diagram_June_2021.png) 1. [Conda Environment Options](dev-guides/conda_environments.md) diff --git a/mkdocs.yml b/mkdocs.yml index a5dedca..b1f1d34 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -37,9 +37,9 @@ nav: - Logging Out and Server Shutdown: user-guides/logging_out_and_server_shutdown.md - Developer Guide: - OpenScienceLab: dev-guides/about_opensciencelab.md - - Build and Deploy the Portal: dev_guides/build_and_deploy_portal.md - - Build and Deploy OpenSARLab Image: dev_guides/build_and_deploy_opensarlab_image.md - - Build and Deploy OpenSARLab Cluster: dev_guides/build_and_deploy_opensarlab_cluster.md + - Build and Deploy the Portal: dev-guides/build_and_deploy_portal.md + - Build and Deploy OpenSARLab Image: dev-guides/build_and_deploy_opensarlab_image.md + - Build and Deploy OpenSARLab Cluster: dev-guides/build_and_deploy_opensarlab_cluster.md - (DEPEC) Deploy OpenSARLab to AWS: dev-guides/deploy_OpenSARLab.md - System Diagram: assets/system_diagrams/OpenSARLab_system_diagram_June_2021.png - Conda Environment Options: dev-guides/conda_environments.md From 7c88c222fc9bd4a38cdabfd1a836f9f19ce0fd14 Mon Sep 17 00:00:00 2001 From: Eric Lundell Date: Wed, 11 Dec 2024 14:55:00 -0900 Subject: [PATCH 5/7] Addtional docs --- docs/dev-guides/build_and_deploy_portal.md | 238 ------- .../build_and_deploy_opensarlab_cluster.md | 5 + .../dev-guides/{ => cluster}/egress_config.md | 12 +- .../{ => cluster}/opensciencelab_yaml.md | 52 +- .../build_and_deploy_opensarlab_image.md | 7 +- .../{ => container}/conda_environments.md | 2 +- .../{ => container}/mintpy_conda.md | 0 docs/dev-guides/deploy_OpenSARLab.md | 585 ------------------ docs/dev-guides/destroy_deployment.md | 43 +- .../portal/build_and_deploy_portal.md | 245 ++++++++ docs/dev-guides/{ => portal}/notifications.md | 4 +- 11 files changed, 313 insertions(+), 880 deletions(-) delete mode 100644 docs/dev-guides/build_and_deploy_portal.md rename docs/dev-guides/{ => cluster}/build_and_deploy_opensarlab_cluster.md (97%) rename docs/dev-guides/{ => cluster}/egress_config.md (84%) rename docs/dev-guides/{ => cluster}/opensciencelab_yaml.md (54%) rename docs/dev-guides/{ => container}/build_and_deploy_opensarlab_image.md (93%) rename docs/dev-guides/{ => container}/conda_environments.md (98%) rename docs/dev-guides/{ => container}/mintpy_conda.md (100%) delete mode 100644 docs/dev-guides/deploy_OpenSARLab.md create mode 100644 docs/dev-guides/portal/build_and_deploy_portal.md rename docs/dev-guides/{ => portal}/notifications.md (94%) diff --git a/docs/dev-guides/build_and_deploy_portal.md b/docs/dev-guides/build_and_deploy_portal.md deleted file mode 100644 index cb5699a..0000000 --- a/docs/dev-guides/build_and_deploy_portal.md +++ /dev/null @@ -1,238 +0,0 @@ -# Build and Deploy the Portal - - -# Enable Under Construction page - -Sometimes the Portal must be taken down for updates. For instance, the EC2 the Portal runs on needs to be respawned for updates. - -To help facilitate communication with users, an Under Construction page can be enabled. All traffic to the Portal will be redirected to this page. - -1. To enable the page, log into the AWS account and go to EC2 console. - -1. Go to the Portal load balancer, select the `HTTPS:443` listener, and _check_ the Default rule. - -1. In the dropdown Actions menu, select Edit Rule. - -1. Set the instance target group weight to **0**. Set the lambda target group weight to **1**. - -1. At the bottom of the page Save Changes. - -1. Changes should take affect almost immediately. - -To revert changes after updating, repeat the above steps except change the target group weights so that the instance gets **1** and the lambda gets **0**. - - -# ---------- - -The following documentation is older and must be used with caution. - -# Prerequsites - -1. AWS SES: Store SES secrets - -These secrets will be used to communicate with SES to send emails. "SMTP credentials consist of a username and a password. When you click the Create button below, SMTP credentials will be generated for you." The credentials are AWS access keys, like as used in local aws configs. They are valid for the whole region. https://us-west-2.console.aws.amazon.com/ses/home - -- Create a verified email and take out of sandbox. - -- Create SES serets - -Go to `Account Dashboard`. - `Create SMTP credentials`. The IAM User Name should be unique and easy to find within IAM. On user creation, SMTP credentials will be created. - -- Store SES secrets - -https://us-west-2.console.aws.amazon.com/secretsmanager/home - -Click on `Store New Secret` -`Other type of secret` -`Plaintext` -Delete all empty json content. -Add username and password as given previously in the following format: `USERNAME PASSWORD`. -Click `Next` -Secret Name: `portal/ses-creds` -Tags: `osl-billing: osl-portal` -Click `Next` -Click `Next` -Click `Store` - - -1. AWS Secrets Manager: Create SSO token - -This token will be used by the labs to communicate and authenticate with the portal. All labs and the portal share this token. It is imperative that this remains secret. The form of the token is very specific. Use the following process to create the token. - -- Create secret - -```bash -pip install cryptography -python3 -``` - -```python3 -from cryptography.fernet import Fernet - -api_token = Fernet.generate_key() -api_token -``` - -- Add to AWS Secret Manager - -https://us-west-2.console.aws.amazon.com/secretsmanager/home - -Click on `Store New Secret` -`Other type of secret` -`Plaintext` -Delete all empty json content. -Add value of _api_token_ -Click `Next` -Secret Name: `$CONTAINER_NAMESPACE/sso-token` -Tags: `osl-billing: osl-portal` -Click `Next` -Click `Next` -Click `Store` - - -1. Docker registry - -For local dev development, one can use a local docker registry. -`docker run -d -p 5000:5000 --restart=always --name registry registry:2` - -Otherwise, the remote docker images will be stored in AWS ECR, as setup by CloudFormation - -1. Docker repo - -Clone the portal code. -If production, push to CodeCommit the portal code. - - -# Setup - -If production, upload the Cloudformation template `cf-portal-setup.yaml` and build. - -Once the cloudformation is done, go to EC2 Connect, log onto the server and `cd /home/ec2-user/code`. -Then setup prerequisites via `make setup-ec2`. -Note that you will be warned about reformatting the DB volume. If this is the first time running (as it should be), do so. - -If locally, go to the root up the docker repo. -The setup prerequisites via `make setup-ubuntu`. - - -# Build - -`cp labs.example.yaml labs.maturity.yaml`. The name of the config doesn't matter (except it cannot be labs.run.yaml) -Update labs.maturity.yaml as needed - -`make config=labs.maturity.yaml` - - -# Destroy - -If production, clear out the registry images, delete the CloudFormation setup, delete snapshots, and delete logs. - -If locally, `make clean` and then stop the localhost registry (if being used). - -# Other less used procedures - -1. Logs - -In production, normally the logs will show up in CloudWatch. - -For both, `docker compose logs -f`. - -1. Replace Portal DB from snapshot - -If the Portal DB needs to be replaced by a snapshot backup, do the following. - -All of these steps take place within EC2 Connect. - -Elevated permissions will be needed via `sudo` or `sudo su -`. - -- Restore snapshot to volume - -This procedure assumes that the usual DB volume is present and being used. We only want to update the DB file. - -Within `cf-portal-setup.yaml`, the AZ of the EC2's subnet is set as us-west-2a. - -From the EC2 Connect console, select the snapshot that will be restored. Get the SNAPSHOT_ID, e.g. snap-0c0dbee2e7c9f0c12 - -From the EC2 Connect console, select the portal EC2. Get the EC2_INSTANCE_ID, e.g. i-0ca96843e97d9bd29 - -First run a dry run to make sure permissions are available. -``` -aws ec2 create-volume \ - --dry-run \ - --availability-zone us-west-2a \ - --snapshot-id $SNAPSHOT_ID \ - --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=portal-db-backup}]' -``` - -Then actually run it. -``` -aws ec2 create-volume \ - --availability-zone us-west-2a \ - --snapshot-id $SNAPSHOT_ID \ - --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=portal-db-backup}]' -``` - -From response output, get VOLUME_ID, e.g. vol-0a0869b5ab9f77090 - -- Attach volume to EC2 - -First run a dry run to make sure permissions are available. -``` -aws ec2 attach-volume \ - --dry-run \ - --device /dev/sdm \ - --instance-id $EC2_INSTANCE_ID \ - --volume-id $VOLUME_ID -``` - -Then actually run it. -``` -aws ec2 attach-volume \ - --device /dev/sdm \ - --instance-id $EC2_INSTANCE_ID \ - --volume-id $VOLUME_ID -``` - -- Mount device to filesystem - -`sudo mkdir -p /tmp/portal-db-from-snapshot/` -`sudo mount /dev/sdm /tmp/portal-db-from-snapshot/` - -If you get an error message something like - -> /wrong fs type, bad option, bad superblock - -then you cannot mount the filesystem. AWS's way of handling volumes makes things difficult. https://serverfault.com/questions/948408/mount-wrong-fs-type-bad-option-bad-superblock-on-dev-xvdf1-missing-codepage - -Since we are working with a temporay mount, run the following instead: - -`sudo mount -t xfs -o nouuid /dev/sdm /tmp/portal-db-from-snapshot/`. - -- Check for mounted directories - -`df` - -Look for something like - -``` -/dev/xvdj 1038336 34224 1004112 4% /srv/jupyterhub -/dev/xvdm 1038336 34224 1004112 4% /tmp/portal-db-from-snapshot -``` - -- Create a backup of the old DB file - -`sudo cp ./srv/portal/jupyterhub/jupyterhub.sqlite ./srv/portal/jupyterhub/jupyterhub.sqlite.$(date +"%F-%H-%M-%S")` - -- Copy over DB file - -`sudo cp /tmp/portal-db-from-snapshot/jupyterhub.sqlite ./srv/portal/jupyterhub/jupyterhub.sqlite` - -- Unmount and detach volume from EC2 - -`sudo umount /tmp/portal-db-from-snapshot/` -`sudo aws ec2 detach-volume --volume-id VOLUME_ID` - -- Delete volume - -`sudo aws ec2 delete-volume --volume-id VOLUME_ID` diff --git a/docs/dev-guides/build_and_deploy_opensarlab_cluster.md b/docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md similarity index 97% rename from docs/dev-guides/build_and_deploy_opensarlab_cluster.md rename to docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md index 00393e3..b2da125 100644 --- a/docs/dev-guides/build_and_deploy_opensarlab_cluster.md +++ b/docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md @@ -103,3 +103,8 @@ 1. Within CloudShell, check the PVC and PV of the user volume. Make sure the K8s annotation `pv.kubernetes.io/provisioned-by: ebs.csi.aws.com` is present. If not, then the JupyterHub volume managment will fail and volumes will become orpaned upon lifecycle deletion. + + +## Destroy OpenSARLab Cluster + +To take down, consult [destroy deployment docs](../destroy_deployment.md) \ No newline at end of file diff --git a/docs/dev-guides/egress_config.md b/docs/dev-guides/cluster/egress_config.md similarity index 84% rename from docs/dev-guides/egress_config.md rename to docs/dev-guides/cluster/egress_config.md index ef76c6d..4827499 100644 --- a/docs/dev-guides/egress_config.md +++ b/docs/dev-guides/cluster/egress_config.md @@ -1,6 +1,6 @@ # Egress Configuration -If enabled, the Istio service mesh can apply rules for rate limiting and domain blocking. These rules for a particular configuration will only apply to the user or dask pod assigned to the corresponsing egress profile. Thbe configurations need to be found in the root/egress_configs directory. +If enabled, the Istio service mesh can apply rules for rate limiting and domain blocking. To facilitate usability, a custom configuration is employed with custom rules. These rules for a particular configuration will only apply to the user or dask pod assigned to the corresponsing egress profile. The configurations need to be found in the {root}/egress_configs directory/useretc. ## Schema @@ -17,7 +17,7 @@ Other line entries: | `@profile` | str | Required. Egress profile name that will be assigned to the lab profile. There can only be one `@profile` per egress config file. Other `@profile` references will be ignored. Because the profile name is part of the naming structure of some k8s resources, it must be fqdn compatible. | | `@rate` | int | Required. Rate limit (per 10 seconds) applied to the assigned pod. Value is the max value of requests per second. Any subsequent `@rate` is ignored. To turn off rate limit, set value to `None`.| | `@list` | `white` or `black` | Required. Either the config is a whitelist or a blacklist. Any subsequent `@list` is ignored. | -| `@include` | str | Optional. Any named `.conf` file within a sibling `includes` folder will be copied/inserted at the point of the `@include`. Having `@rate`, `@include`, or `@profile` within the "included" configs will throw and error. Other rules for ordering still apply. | +| `@include` | str | Optional. Any named `.conf` file within a sibling `includes` folder will be copied/inserted at the point of the `@include`. Having `@rate`, `@include`, or `@profile` within the "included" configs will throw an error. Other rules for ordering still apply. | | `%port` | int,int | Required. Port value for the host. Must have a value between 1 and 65535. Ports can be consolidated by comma seperation. Ports seperated by `=>` will be treated like a redirect (_this is currently not working. The ports will be treated as seperated by a comma_). | |`%timeout` | str | Optional. Timeout for a valid timeout for any subsequent host. The vlaue must end in `s` for seconds, `m` for minutes, etc. | |`+ip` | num | Optional. Any valid fqdn ip address.| @@ -30,6 +30,14 @@ Lines not prepended with `@`, `%`, `+`, `^`, or `#` will be treated as a hostnam **Blacklist with rate limiting** +``` conf +# Included blacklist +%timeout 10s +%port 80=>443 + +example.com +``` + ``` conf # This conf is required!! # This will be used by profiles that don't have any explicit whitelist and are not None diff --git a/docs/dev-guides/opensciencelab_yaml.md b/docs/dev-guides/cluster/opensciencelab_yaml.md similarity index 54% rename from docs/dev-guides/opensciencelab_yaml.md rename to docs/dev-guides/cluster/opensciencelab_yaml.md index 01c5ca0..40dc5e2 100644 --- a/docs/dev-guides/opensciencelab_yaml.md +++ b/docs/dev-guides/cluster/opensciencelab_yaml.md @@ -7,20 +7,33 @@ Schema for the egress config can be found [../egress_config.md](here). parameters: lab_short_name: The url-friendly short name of the lab deployment. - cost_tag_key: Name of the cost allocation tag - cost_tag_value: Value of the cost allocation tag + cost_tag_key: Name of the cost allocation tag. + cost_tag_value: Value of the cost allocation tag. Also used by cloudformation during setup for naming. admin_user_name: Username of initial JupyterHub admin certificate_arn: AWS arn of the SSL certificate held in Certificate Manager container_namespace: A namespaced path within AWS ECR containing custom images lab_domain: Domain of JupyterHub deployment. Use `load balancer` if not known. portal_domain: Domain of the OSL Portal. Used to communicate with email services, etc. + + # Volume and snapshot lifecycle managament days_till_volume_deletion: The number of integer days after last server use when the user's volume is deleted. To never delete volume, use value 365000. - days_till_snapshot_deletion: The number of integer days after last server use when the user's snapshot is deleted. To never delete snapshot, use value 365000. days_after_server_stop_till_warning_email: Comma seperated list of integer days after last server use when user gets warning email. Must have minimum one value. To never send emails, use value 365000 + days_till_snapshot_deletion: The number of integer days after last server use when the user's snapshot is deleted. To never delete snapshot, use value 365000. days_after_server_stop_till_deletion_email: Number of integer days after last server use when user gets email notifiying about permanent deletion of data. Must have minimum one value. To never send emails, use value 365000 utc_hour_of_day_snapshot_cron_runs : Integer hour (UTC) when the daily snapshot cron runs. utc_hour_of_day_volume_cron_runs: Integer hour (UTC) when the daily snapshot cron runs. + # Versions of sofware installed + eks_version: '1.31' # https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html + kubectl_version: '1.31.0/2024-09-12' # https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html + aws_ebs_csi_driver_version: '2.36.0' # https://github.com/kubernetes-sigs/aws-ebs-csi-driver/releases + jupyterhub_helm_version: '3.3.7' # https://jupyterhub.github.io/helm-chart/ + jupyterhub_hub_image_version: '4.1.5' # Match App Version of JupyterHub Helm + aws_k8s_cni_version: 'v1.18.5' # https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html + cluster_autoscaler_helm_version: '9.43.1' # https://github.com/kubernetes/autoscaler/releases > cluster-autoscaler-chart + istio_version: '1.23.2' # https://github.com/istio/istio/releases; set to None if disabling Istio + dask_helm_version: '2024.1.0' # https://helm.dask.org/ > dask-gateway-{version}; Set to None if disabling Dask + nodes: - name: hub # Required instance: The EC2 instance for the hub node. Type t3a.medium is preferred. @@ -29,13 +42,25 @@ nodes: node_policy: hub # Required is_hub: True # Required + - name: daskcontroller # Required + instance: t3a.medium, t3.medium + min_number: 1 # Required + max_number: 1 # Required + node_policy: dask_controller # Required + is_dask_controller: True # Required + is_spot: True + - name: Name of node type. Must be alphanumeric (no special characters, whitespace, etc.) - instance: The EC2 instance for the hub node (m5a.2xlarge) + instance: The EC2 instance for the hub node. Fallback types seperated by commas. (m6a.xlarge, m5a.xlarge) min_number: Minimum number of running node of this type in the cluster (0) max_number: Maximum number of running node of this type in the cluster (25) node_policy: Node permission policy (user) root_volume_size: Size of the root volume of the EC2 (GiB) (Optional, range 1 - 16,384) + is_dask_worker: The EC2 is a dask worker (Optional, True). + is_spot: The EC2 is part of a spot fleet (Optional, True). +# Service accounts allow a built-in way to interact with AWS resources from within a server. +# However, the default AWS profile is overwritten and may have inintended consequences. service_accounts: - name: service_account_name namespace: namespace of k8s resource (jupyter) @@ -45,20 +70,27 @@ service_accounts: - "AWS Resource Action" Resource: "AWS Resource ARN" -profiles: +dask_profiles: + - name: Name of dask profile that the user can select (Example 1) + short_name: example_1 + description: "Basic worker used by example notebook" + image_url: FQDN with docker tags (233535791844.dkr.ecr.us-west-2.amazonaws.com/smce-test-opensarlab/daskworker:180a826). If not public, the domain must be in the same AWS account as the cluster. + node_name: Node must be defined as a dask worker. + egress_profile: Name of the egress config to use. Do not include `.conf` suffix (Optional) + +lab_profiles: - name: Name of profile that users can select (SAR 1) description: Description of profile - image_name: Name of JupyterHub single user image found in ECR (sar1) - image_tag: Tag of JupyterHub single user image found in ECR (b1f4e84) + image_url: FQDN of JupyterLab single user image with docker tags ( 233535791844.dkr.ecr.us-west-2.amazonaws.com/smce-test-opensarlab/sar:ea3e147). If not public, the domain must be in the same AWS account as the cluster. hook_script: Name of the script ran on user server startup (sar.sh) (optional) memory_guarantee: RAM usage guaranteed per user (6G) (Optional. Defaults to 0% RAM.) memory_limit: RAM usage guaranteed per user (16G) (Optional. Defaults to 100% RAM of server.) - cpu_guarantee: CPU usage guaranteed per user (15) (Optional. Defaults to 0% CPU.) - cpu_limit: CPU usage limit per user (30) (Optional. Defaults to 100% CPU of server.) + cpu_guarantee: CPU usage guaranteed per user (15) (Optional. Defaults to 0% CPU. Memory limits are preferable.) + cpu_limit: CPU usage limit per user (30) (Optional. Defaults to 100% CPU of server. Memory limits are preferable.) storage_capacity: Size of each user's home directory (500Gi). Cannot be reduced after allocation. node_name: Node name as given in above section (sar1) delete_user_volumes: If True, deletes user volumes upon server stopping (Optional. Defaults to False.) - classic: If True, use Classic Notebook interface (Optional. Defaults to False, i.e. JupyterLab.) + desktop: If True, use Virtual Desktop by default (Optional. Defaults to False) The desktop enviromnent must be installed on image. default: If True, the specific profile is selected by default (Optional. False if not explicity set.) service_account: Name of previously defined service account to apply to profile (Optional) egress_profile: Name of the egress config to use. Do not include `.conf` suffix (Optional) diff --git a/docs/dev-guides/build_and_deploy_opensarlab_image.md b/docs/dev-guides/container/build_and_deploy_opensarlab_image.md similarity index 93% rename from docs/dev-guides/build_and_deploy_opensarlab_image.md rename to docs/dev-guides/container/build_and_deploy_opensarlab_image.md index ceabc74..d9b90c8 100644 --- a/docs/dev-guides/build_and_deploy_opensarlab_image.md +++ b/docs/dev-guides/container/build_and_deploy_opensarlab_image.md @@ -1,4 +1,4 @@ -# Build and Deploy OpenSARLab Image +# Build and Deploy OpenSARLab Image Container ## Setup Container Build in AWS @@ -52,3 +52,8 @@ A successful run will take about 20 minutes. If it takes signitifcantly less time then the build might have failed even if CodePipeline says successful. + + +## Destroy OpenSARLab Image Container + +To take down, consult [destroy deployment docs](../destroy_deployment.md) \ No newline at end of file diff --git a/docs/dev-guides/conda_environments.md b/docs/dev-guides/container/conda_environments.md similarity index 98% rename from docs/dev-guides/conda_environments.md rename to docs/dev-guides/container/conda_environments.md index 9207ad5..93d7c12 100644 --- a/docs/dev-guides/conda_environments.md +++ b/docs/dev-guides/container/conda_environments.md @@ -1,4 +1,4 @@ -[Return to Developer Guide](../dev.md) +[Return to Developer Guide](../../dev.md) # There are a few options for creating conda environments in OpenSARLab. Each option come with benefits and drawbacks. diff --git a/docs/dev-guides/mintpy_conda.md b/docs/dev-guides/container/mintpy_conda.md similarity index 100% rename from docs/dev-guides/mintpy_conda.md rename to docs/dev-guides/container/mintpy_conda.md diff --git a/docs/dev-guides/deploy_OpenSARLab.md b/docs/dev-guides/deploy_OpenSARLab.md deleted file mode 100644 index 2608f1c..0000000 --- a/docs/dev-guides/deploy_OpenSARLab.md +++ /dev/null @@ -1,585 +0,0 @@ -[Return to Developer Guide](../dev.md) - -Deploy OpenSARLab to an AWS account -===================== - -**A note about deployments:** A deployment of OpenSARLab refers to a standalone instance of OpenSARLab. -If you are setting up OpenSARLab for several classes and/or collaborative groups with disparate needs or funding sources, -it may be useful to give them each their own standalone deployment. This separates user group authentication, -simplifies billing for each group, and allows for easy cleanup at the end of a project or class (just delete the deployment). -In the following instructions, replace any occurrence of "`deployment_name`" with the deployment name you have chosen. - -**Make your deployment name lowercase and use no special characters other than dashes (-). It will be used to -generate part of the Cognito callback URL and CloudFormation stack names also follow the same naming convention.** - -Take AWS SES out of sandbox --------------------- - -**The AWS Simple Email Service is used by OpenSARLab to send emails to users and administrators. These include -authentication related notifications and storage lifecycle management messages.** - -While SES is in sandbox, you are limited to sending 1 email per second with no more than 200 in a 24 hour period, and they -may only be sent from an SES verified address to other SES verified addresses. - -Note: Provide a detailed explanation of your SES use and email policies when applying to exit the sandbox or you will be denied. - -**Approval can take 24-48 hours** - -1. Follow these [instructions](https://docs.aws.amazon.com/ses/latest/DeveloperGuide/request-production-access.html) to -take your SES out of sandbox. - - -Create an AWS Cost Allocation Tag --------------------- -**Note: only management accounts can create cost allocation tags** - -1. Create a cost allocation tag or have one created by someone with access - 1. Give it an available name that makes sense for tracking deployment names associated with AWS resources - 1. i.e. "deployment_name" - -Add dockerhub credentials to AWS Secrets Manager --------------------- -**This deployment uses a few publicly available docker images. Due to dockerhub rate limits ([https://www.docker.com/increase-rate-limits](https://www.docker.com/increase-rate-limits)), -you will need to set up a dockerhub account. A free-tier account will suffice. CodePipeline's ip address is shared by many -users and you will likely hit the rate limit as an anonymous user -([details here](https://aws.amazon.com/blogs/containers/advice-for-customers-dealing-with-docker-hub-rate-limits-and-a-coming-soon-announcement/)).** - -Note: By default this secret will be used for multiple deployments. Optionally, you could edit the codebuild section in the cf-cluster.yml to -point to a different secret. - -1. If you don't have a dockerhub account, create one [here](https://hub.docker.com/signup) -1. Open the AWS Secrets Manager console -1. Click the "Store a new secret" button - 1. Page 1: - 1. Select "Other type of secrets" - 1. Select the "Plaintext" tab - 1. Delete the default content - 1. Add your username and password, separated by a space - 1. Example: `username password` - 1. Click the "Next" button - 1. Page 2: - 1. Secret name - 1. `dockerhub/creds` - 1. Click the "Next" button - 1. Page 3: - 1. Click the "Next" button - 1. Page 4: - 1. Click the "Store" button - -Setup an iCal calendar for notifications --------------------- -**Notifications are generated from iCal calendar events. ASF uses Google Calendar but any publicly accessible iCal -formatted calendar should work as well** - -1. Create a public iCal formatted calendar -1. The iCal formatted url will be needed in later -1. Notification calendar events must be properly formatted. - 1. Formatting details available in the [Take care of odds and ends](#Take-care-of-odds-and-ends) section - -Store your CA certificate --------------------- -**OpenSARLab will lack full functionality if not using https (SSL certification)** - -1. Follow these [instructions](https://docs.aws.amazon.com/acm/latest/userguide/import-certificate.html) to import your CA certificate into the AWS Certificate Manager - -Prepare CodeCommit Repos --------------------- -TODO Do this differently - -**All the public OpenSARlab repos are in the [ASFOpenSARlab](https://github.com/ASFOpenSARlab) Github Org** - -1. Create a `deployment_name`-container CodeCommit repo in your AWS account -1. Create a `deployment_name`-cluster CodeCommit repo -1. Clone the `deployment_name`-container and `deployment_name`-cluster repos to your local computer using ssh -1. cd into your local `deployment_name`-container repo - 1. add ASFOpenSARlab/opensarlab-container as a remote on your local `deployment_name`-container repo - 1. `git remote add github https://github.com/ASFOpenSARlab/opensarlab-container.git` - 1. Pull the remote opensarlab-container repo into your local `deployment_name`-container repo - 1. `git pull github main` - 1. Create a main branch in the `deployment_name`-container repo - 1. `git checkout -b main` - 1. Push to the remote `deployment_name`-container repo - 1. `git push origin main` -1. cd into your local `deployment_name`-cluster repo - 1. add ASFOpenSARlab/opensarlab-cluster as a remote on your local `deployment_name`-cluster repo - 1. `git remote add github https://github.com/ASFOpenSARlab/opensarlab-cluster.git` - 1. Pull the remote opensarlab-cluster repo into your local `deployment_name`-cluster repo - 1. `git pull github main` - 1. Create a main branch in the `deployment_name`-cluster repo - 1. `git checkout -b main` - 1. Push to the remote `deployment_name`-cluster repo - 1. `git push origin main` - -**You should now have container and cluster repos in CodeCommit that are duplicates of those found in ASFOpenSARlab** - -Customize opensarlab_container code for deployment --------------------- -**The opensarlab-container repo contains one example image named `helloworld`, which you can reference when creating new images. -Images can be used by multiple profiles** - -Note: It is easiest to work in your local repo and push your changes when you're done. - -1. Duplicate the `images/sar` directory and rename it, using your chosen image name - 1. The image name must be alpha-numeric with no whitespaces or special characters -1. Edit the dockerfile - 1. Adjust the packages in the 2nd apt install command to suit your image needs - 1. Add any pip packages you wish installed in the base conda environment - 1. Add any conda packages you wish installed in the base conda environment - 1. Create any conda environments you would like pre-installed before "USER jovyan" - 1. If using environment.yml files, store them in an "envs" directory in /jupyter-hooks, and they will be copied into the container - 1. RUN conda env create -f /etc/jupyter-hooks/envs/_env.yml --prefix /etc/jupyter-hooks/envs/ - 1. Run any tests for this image that you added to the tests directory under `FROM release as testing` -1. Remove the images/sar directory and sar.sh test script, unless you plan to use the sar image -1. Add a test script for your image - 1. use sar.sh as an example - 1. name it .sh -1. Add, commit, and push changes to the remote CodeCommit repo - -Customize opensarlab_cluster code for deployment --------------------- -1. Create and add any additional custom jupyter magic commands to the `opensarlab/jupyterhub/singleuser/custom_magics` directory Add any additional scripts you may have created for use in your image to the `opensarlab/jupyterhub/singleuser/hooks` directory -1. Duplicate `opensarlab/jupyterhub/singleuser/hooks/sar.sh`, renaming it after your image name - 1. Edit `opensarlab/jupyterhub/singleuser/hooks/.sh` - 1. Copy any additional custom Jupyter magic scripts to `$HOME/.ipython/image_default/startup/` (alongside 00-df.py) - 1. Edit the repos being pulled to suit your deployment and image needs -1. Rename `opensarlab/opensarlab.example.yaml` to `opensarlab/opensarlab.yaml` - 1. Use the example notes in `opensarlab/opensarlab.yaml` to define the required and optional fields -1. Update `opensarlab/jupyterhub/helm_config.yaml` - 1. `singleuser` - 1. Add any needed extraFiles - 1. `hub` - 1. Add any needed extraFiles -1. Add, commit, and push changes to the remote CodeCommit repo - -Build the container CloudFormation stack --------------------- -**This will create the hub image, images for each profile, and store them in namespaced ECR repos** - -1. Open CloudFormation in the AWS console - 1. Click the "Create stack" button and select "With new resources (standard)" - 1. Page 1 : **Create stack** - 1. Under "Specify template", check "Upload a template file" - 1. Use the file chooser to select **cf-container.py** from your local branch of the `deployment_name`-container repo - 1. Click the "Next" button - 1. Page 2: **Specify stack details** - 1. `Stack Name` - 1. Use a recognizable name that makes sense for your deployment - 1. `CodeCommitSourceRepo` - 1. The CodeCommit repo holding the container code (`deployment_name`-container) - 1. `CodeCommitSourceBranch` - 1. The name of the production branch of the `deployment_name`-container CodeCommit repo - 1. `CostTagKey` - 1. The cost allocation key you registered for tracking deployment costs - 1. `CostTagValue` - 1. `deployment_name` - 1. Page 3: **Configure stack options** - 1. Tags: - 1. Key: Cost allocation tag - 1. Value: `deployment_name` - 1. Click the "Next" button - 1. Page 4: **Review `Stack Name`** - 1. Review and confirm correctness - 1. Check the box next to "I acknowledge that AWS CloudFormation might create IAM resources" - 1. Click the "Create Stack Button" - 1. Monitor the stack build for errors and rollbacks - 1. The screen does not self-update - 1. Use the refresh buttons - 1. If the build fails and rolls back - 1. goto the CloudFormation stacks page - 1. select and delete the failed stack before correcting any errors and trying again - -Build the cluster CloudFormation stack --------------------- -**This CloudFormation stack dynamically creates 3 additional stacks.** - -1. Open CloudFormation in the AWS console - 1. Page 1 : **Create stack** - 1. Click the "Create stack" button and select "With new resources (standard)" - 1. Under "Specify template", check "Upload a template file" - 1. Use the file chooser to select `opensarlab/pipeline/cf-pipeline.yaml` from your local branch of the cluster repo - 1. Click the "Next" button - 1. Page 2: **Specify stack details** - 1. `Stack Name` - 1. Use a recognizable name that makes sense for your deployment. **Do not use a stack name that ends in `cluster`, `jupyterhub`, or `cognito`. These are reserved.** - 1. `CodeCommitRepoName` - 1. The CodeCommit repo holding the container code (`deployment_name`-cluster) - 1. `CodeCommitBranchName` - 1. The name of the production branch of the `deployment_name`-cluster CodeCommit repo - 1. `CostTagKey` - 1. The cost allocation key you registered for tracking deployment costs - 1. `CostTagValue` - 1. `deployment_name` - 1. Page 3: **Configure stack options** - 1. Tags: - 1. Key: Cost allocation tag - 1. Value: `deployment_name` - 1. Click the "Next" button - 1. Page 4: **Review `Stack name`** - 1. Review and confirm correctness - 1. Check the box next to "I acknowledge that AWS CloudFormation might create IAM resources" - 1. Click the "Create Stack" button - -Take care of odds and ends --------------------- - -1. Update `deployment_url` in the cluster repo `opensarlab/opensarlab.yaml` if you started off using `load balancer` - 1. Don't forget to update your DNS record -1. Add the cost allocation tag to the EKS cluster - 1. Navigate to the AWS EKS console - 1. click the "Clusters" link in the sidebar menu - 1. Click on cluster stack - 1. Click the "Tags" tab - 1. Click the "Manage tags" button - 1. Click the "Add tag" button - 1. Key: Cost allocation tag - 1. Value: `deployment_name` -1. Prime the Auto Scaling Group for each profile unless there are active users - 1. Navigate to the AWS EC2 console - 1. Select the "Auto Scaling Groups" sidebar link - 1. Select an autoscaling group - 1. Group details: - 1. Click the "Edit" button - 1. Desired capacity: - 1. Set to 1 - 1. Click the "Update" button -1. Create a test notification - 1. Navigate to your notification calendar - 1. Create an event - 1. Set the event to last as long as you wish the notification to display - 1. The event title will appear as the notification title - 1. The description includes a metadata and message section - 1. Example: - 1. ``` - - profile: MY PROFILE, OTHER PROFILE - type: info - - - This is a notification - ``` - 1. \ - 1. profile: - 1. Holds the name or names (comma separated) of the profiles where the notification will be displayed - 1. type: - 1. info - 1. blue notification - 1. success - 1. green notification - 1. warning - 1. yellow notification - 1. error - 1. red notification - 1. \ - 1. Your notification message -1. Sign up with your `admin_user_name` account, sign in, and add groups for each profile and sudo - 1. Open the `deployment_url` in a web browser - 1. Click the "Sign in" button - 1. Click the "Sign up" link - 1. Username: - 1. The name used for the `admin_user_name` parameter of the `opensarlab.yaml` - 1. Name: - 1. Your name - 1. Email: - 1. Enter the email address used for the AdminEmailAddress parameter in the `deployment_name`-auth CloudFormation stack - 1. Password: - 1. A password - 1. Click the "Sign up" button - 1. Verification Code: - 1. The verification code sent to your email address - 1. Click the "Confirm Account" button - 1. Add a group for each profile and for sudo - 1. After confirming your account you should be redirected to the Server Options page - 1. Click the "Groups" link at the top of the screen - 1. Click the "Add New Group" button - 1. Group Name: - 1. The group name as it appears in the helm_config.yaml group_list - 1. Note that this is not the display name and it contains underscores - 1. Group Description: - 1. (optional) Enter a group description - 1. Group Type: - 1. check "action" - 1. This has no effect, but is useful for tracking user groups vs. profile groups - 1. All Users?: - 1. Check if you wish the profile to be accessible to all users - 1. Is Enabled?: - 1. check the box - 1. Click the "Add Group" button - 1. Repeat for all profiles - 1. Repeat for a group named "sudo" - 1. Do not enable sudo for all users! - 1. This is useful for developers but avoid giving root privileges to regular users - 1. Click the "Home" link at the top of the screen - 1. Start up and test each profile - 1. Click the "Start My Server" button - 1. Select a profile - 1. Click the "Start" button - 1. Confirm that the profile runs as expected - 1. Test notebooks as needed - 1. Confirm that notifications appear - 1. Repeat for each profile -1. Configure your local K8s config so you can manage your EKS cluster with kubectl - 1. Add your AWS user to the trust relationship of the `deployment_name`-cluster-access IAM role - 1. Navigate to the AWS IAM console - 1. Click the "Roles" link from the sidebar menu - 1. Select the `deployment_name`-cluster-access IAM role - 1. Click the "Trust relationships" tab - 1. Click the "Edit trust relationship" button - 1. Add your AWS user ARN - 1. Example json: - 1. ```json - { - "Version": "2008-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "AWS": [ - "arn:aws:iam:::user/" - ] - }, - "Action": "sts:AssumeRole" - } - ] - } - ``` - 1. Click the "Update Trust Policy" button - 1. Add an AWS profile on your local machine - 1. Example profile: - 1. ```yaml - [profile profile_name] - source_profile = your_source_profile - region = your_region - role_arn = arn:aws:iam:::role/--cluster-user-access - cluster_name = -cluster - ``` - 1. Run the helps/get_eks_kubeconfig.sh script in the opensarlab-cluster repo - 1. Note: you will use this a lot and it may be helpful to create an alias in ~/.bash_aliases - 1. Use kubectl - - - - \ No newline at end of file diff --git a/docs/dev-guides/destroy_deployment.md b/docs/dev-guides/destroy_deployment.md index b40a8d1..e51674f 100644 --- a/docs/dev-guides/destroy_deployment.md +++ b/docs/dev-guides/destroy_deployment.md @@ -11,7 +11,7 @@ It is essential to destroy a deployment at the end of its life cycle so that no - When deleting the `CloudFormation` stack, the deletion order matters. - Delete some of the `CloudFormation` stacks before deleting ECR. - The name of the items you're deleting may differ depending on the deployment you are taking down. For example, your deployment's `CloudFormation` stack may not have a `region` name. -- Do **NOT** take down the `Cognito` and `CloudWatch` logs. These are used for statistical analysis later on. +- Do **NOT** take down the `CloudWatch` logs. These are used for statistical analysis later on. --- @@ -113,8 +113,6 @@ Specifically, you will need to delete your stacks in the following order: _NB:*These stacks have additional steps._ -**WARNING: As mentioned earlier, do NOT delete the `-cognito` stack.** - In above order, follow these steps (except for stack 3): 1. Delete the `-` CloudFormation stack @@ -136,15 +134,6 @@ If you are deleting stack 3 (`-cluster-pipeline`), you will nee _NB: Refer to the **Delete ECR Repos** section for how to delete ECR repos._ - -1. Empty the `codepipeline` and `lambda` S3 bucket - 1. Navigate to the AWS S3 console - 1. Check the box next to the `codepipeline--` S3 bucket - 1. Click the `Empty` button - 1. Confirm the deletion of bucket contents by typing `permanently delete` in the provided field - 1. Click the `Empty` button - 1. Repeat the same process for the `--lambda` - --- ## **Delete EBS Snapshots and Volumes** @@ -174,34 +163,6 @@ First, navigate to the AWS EC2 console - this step should be identical for both 1. Select `Delete volumes` from the `Actions` menu 1. Confirm by clicking the `Yes, delete` button ---- - -## **(Optional) Delete the CodeCommit repositories** - -This section will guide you on how to remove the `-container` and `-cluster` repositories located in the CodeCommit. - -_**Important Note:**_ Often, it would be in your best interest to preserve the CodeCommit repositories since the cost of maintaining them are minuscule. - -If you believe that you may re-deploy the same deployment, you may want to ease future work in one of the following manners: - -1. Leaving these repositories in place, i.e., don't delete them. -1. Download the zip of your repositories, store them in S3, and then delete them. - -In another word, delete the CodeCommit repositories if and only if you are sure that you don't need them. - ---- - -First, navigate to the AWS CodeCommit console: - -![CodeCommit Repos](../assets/docs-codecommit.PNG) - -Then delete the `-container` and `-cluster` in any order. The deletion process for these two repositories is following: - -1. Check the option next to the repository -1. Click the `Delete repository` button -1. Confirm the deletion by typing `delete` in the provided field -1. Click the `Delete` button - --- ## **(Optional) Confirm that all resources have been deleted** @@ -224,7 +185,7 @@ Once you've taken down the deployment, you may want to verify the resource usage --- -## **Delete Calendar** +## **(Optional) Delete Calendar** Now that you are done with taking down the deployment, you will need to delete the calendar notifications. diff --git a/docs/dev-guides/portal/build_and_deploy_portal.md b/docs/dev-guides/portal/build_and_deploy_portal.md new file mode 100644 index 0000000..76ed668 --- /dev/null +++ b/docs/dev-guides/portal/build_and_deploy_portal.md @@ -0,0 +1,245 @@ +# Build and Deploy the Portal + + +# Enable Under Construction page + +Sometimes the Portal must be taken down for updates. For instance, the EC2 the Portal runs on needs to be respawned for updates. + +To help facilitate communication with users, an Under Construction page can be enabled. All traffic to the Portal will be redirected to this page. + +1. To enable the page, log into the AWS account and go to EC2 console. + +1. Go to the Portal load balancer, select the `HTTPS:443` listener, and _check_ the Default rule. + +1. In the dropdown Actions menu, select Edit Rule. + +1. Set the instance target group weight to **0**. Set the lambda target group weight to **1**. + +1. At the bottom of the page Save Changes. + +1. Changes should take affect almost immediately. + +To revert changes after updating, repeat the above steps except change the target group weights so that the instance gets **1** and the lambda gets **0**. + + +# ---------- + +The following documentation is older and must be used with caution. + +# Prerequisites + +## AWS SES: Store SES secrets + +These secrets will be used to communicate with SES to send emails. "SMTP credentials consist of a username and a password. When you click the Create button below, SMTP credentials will be generated for you." The credentials are AWS access keys, like as used in local aws configs. They are valid for the whole region. https://us-west-2.console.aws.amazon.com/ses/home + +- Create a verified email and take out of sandbox. + +- Create SES serets + + Go to `Account Dashboard` > `Create SMTP credentials`. The IAM User Name should be unique and easy to find within IAM. On user creation, SMTP credentials will be created. + +- Store SES secrets + + 1. Go to https://us-west-2.console.aws.amazon.com/secretsmanager/home + + 1. Click on `Store New Secret` > `Other type of secret` > `Plaintext` + + 1. Delete all empty json content. + + 1. Add username and password as given previously in the following format: `USERNAME PASSWORD`. + + 1. Click `Next` + + 1. Secret Name: `portal/ses-creds`, Tags: `osl-billing: osl-portal` + + 1. Click `Next` + + 1. Click `Next` + + 1. Click `Store` + + +## AWS Secrets Manager: Create SSO token + +This token will be used by the labs to communicate and authenticate with the portal. All labs and the portal share this token. It is imperative that this remains secret. The form of the token is very specific. Use the following process to create the token. + +- Create secret + + ```bash + pip install cryptography + ``` + + ```python3 + from cryptography.fernet import Fernet + + api_token = Fernet.generate_key() + api_token + ``` + +- Add to AWS Secret Manager + + 1. Go to https://us-west-2.console.aws.amazon.com/secretsmanager/home + + 1. Click on `Store New Secret` > `Other type of secret` > `Plaintext` + + 1. Delete all empty json content. + + 1. Add _api_token_. + + 1. Click `Next` + + 1. Secret Name: `$CONTAINER_NAMESPACE/sso-token`, Tags: `osl-billing: osl-portal` + + 1. Click `Next` + + 1. Click `Next` + + 1. Click `Store` + + +## Docker registry + +For local dev development, one can use a local docker registry. +`docker run -d -p 5000:5000 --restart=always --name registry registry:2` + +Otherwise, the remote docker images will be stored in AWS ECR, as setup by CloudFormation + +## Docker repo + +Clone the portal code: `git clone git@github.com:ASFOpenSARlab/deployment-opensciencelab-prod-portal.git` + + +# Setup + +If production, upload the Cloudformation template `cf-portal-setup.yaml` and build. + +Once the cloudformation is done, go to EC2 Connect of the portal EC2, log onto the server and `cd /home/ec2-user/code`. +Then setup prerequisites via `make setup-ec2`. +Note that you will be warned about reformatting the DB volume. If this is the first time running (as it should be), do so. + +If locally, go to the root of the docker repo. +Then setup prerequisites via `make setup-ubuntu`. + + +# Build + +`cp labs.example.yaml labs.{maturity}.yaml`. The name of the config doesn't matter (except it cannot be labs.run.yaml) +Update labs.{maturity}.yaml as needed + +`make config=labs.{maturity}.yaml` + + +# Destroy + +If production, clear out the registry images, delete the CloudFormation setup, delete db ebs snapshots, and delete logs. + +If locally, `make clean` and then stop the localhost registry (if being used). + +# Other less used procedures + +## Logs + +In production, normally the logs will show up in CloudWatch. + +For both, `docker compose logs -f`. + + +## Replace Portal DB from snapshot + +If the Portal DB needs to be replaced by a snapshot backup, do the following. + +_Unless otherwise sated, all of these steps take place within EC2 Connect._ + +Elevated permissions will be needed via `sudo` or `sudo su -`. + + +1. Restore backup snapshot to volume + + This procedure assumes that the usual DB volume is present and being used. We only want to update the DB file. + + Within `cf-portal-setup.yaml`, it is assumed that the AZ of the EC2's subnet is set as us-west-2a. + + a. From the EC2 console, select the snapshot that will be restored. Get the SNAPSHOT_ID, e.g. snap-0c0dbee2e7c9f0c12 + + b. From the EC2 console, select the portal EC2. Get the EC2_INSTANCE_ID, e.g. i-0ca96843e97d9bd29 + + c. First run a dry run to make sure permissions are available for EBS volume creation. + ``` + aws ec2 create-volume \ + --dry-run \ + --availability-zone us-west-2a \ + --snapshot-id $SNAPSHOT_ID \ + --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=portal-db-backup}]' + ``` + + Then actually create an EBS volume from the backup snapshot. + ``` + aws ec2 create-volume \ + --availability-zone us-west-2a \ + --snapshot-id $SNAPSHOT_ID \ + --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=portal-db-backup}]' + ``` + + From response output, get VOLUME_ID, e.g. vol-0a0869b5ab9f77090 + +1. Attach backup volume to EC2 + + First run a dry run to make sure permissions are available. + ``` + aws ec2 attach-volume \ + --dry-run \ + --device /dev/sdm \ + --instance-id $EC2_INSTANCE_ID \ + --volume-id $VOLUME_ID + ``` + + Then actually run it. + ``` + aws ec2 attach-volume \ + --device /dev/sdm \ + --instance-id $EC2_INSTANCE_ID \ + --volume-id $VOLUME_ID + ``` + +1. Mount device to filesystem + + `sudo mkdir -p /tmp/portal-db-from-snapshot/` + `sudo mount /dev/sdm /tmp/portal-db-from-snapshot/` + + If you get an error message something like + + > /wrong fs type, bad option, bad superblock + + then you cannot mount the filesystem. AWS's way of handling volumes makes things difficult. https://serverfault.com/questions/948408/mount-wrong-fs-type-bad-option-bad-superblock-on-dev-xvdf1-missing-codepage + + Since we are working with a temporay mount, run the following instead: + + `sudo mount -t xfs -o nouuid /dev/sdm /tmp/portal-db-from-snapshot/`. + +1. Check for mounted directories + + `df` + + Look for something like (not /dev/xvdj) + + ``` + /dev/xvdj 1038336 34224 1004112 4% /srv/jupyterhub + /dev/xvdm 1038336 34224 1004112 4% /tmp/portal-db-from-snapshot + ``` + +1. Create a backup of the old DB file within the filesystem for just in case + + `sudo cp ./srv/portal/jupyterhub/jupyterhub.sqlite ./srv/portal/jupyterhub/jupyterhub.sqlite.$(date +"%F-%H-%M-%S")` + +1. Copy over backup DB file + + `sudo cp /tmp/portal-db-from-snapshot/jupyterhub.sqlite ./srv/portal/jupyterhub/jupyterhub.sqlite` + +1. Unmount and detach backup volume from EC2 + + `sudo umount /tmp/portal-db-from-snapshot/` + `sudo aws ec2 detach-volume --volume-id VOLUME_ID` + +1. Delete backup volume + + `sudo aws ec2 delete-volume --volume-id VOLUME_ID` diff --git a/docs/dev-guides/notifications.md b/docs/dev-guides/portal/notifications.md similarity index 94% rename from docs/dev-guides/notifications.md rename to docs/dev-guides/portal/notifications.md index 325c3e5..d5b6122 100644 --- a/docs/dev-guides/notifications.md +++ b/docs/dev-guides/portal/notifications.md @@ -1,4 +1,4 @@ -[Return to Developer Guide](../dev.md) +[Return to Developer Guide](../../dev.md) # Create OpenSARLab Notifications @@ -27,5 +27,5 @@ 1. Turn off text automated formatting 1. Select all the text in the message body 1. Click the remove formatting button in the message toolbar - ![Image of a notification event being created in Google Calendar](../assets/notification.png) + ![Image of a notification event being created in Google Calendar](../../assets/notification.png) From bfb41be66c05c28181c9f26a2e00285848e1d86d Mon Sep 17 00:00:00 2001 From: Eric Lundell Date: Wed, 11 Dec 2024 15:07:34 -0900 Subject: [PATCH 6/7] Update menu paths --- docs/dev.md | 13 ++++++------- docs/release-notes/release_12-2024.md | 5 +++++ mkdocs.yml | 14 +++++++------- 3 files changed, 18 insertions(+), 14 deletions(-) create mode 100644 docs/release-notes/release_12-2024.md diff --git a/docs/dev.md b/docs/dev.md index 5540148..3530d67 100644 --- a/docs/dev.md +++ b/docs/dev.md @@ -1,10 +1,9 @@ 1. [OpenScienceLab](dev-guides/about_opensciencelab.md) -1. [Build and Deploy the Portal](dev-guides/build_and_deploy_portal.md) -1. [Build and Deploy OpenSARLab Image](dev-guides/build_and_deploy_opensarlab_image.md) -1. [Build and Deploy OpenSARLab Cluster](dev-guides/build_and_deploy_opensarlab_cluster.md) -1. [(DEPEC) Deploy OpenSARLab to AWS](dev-guides/deploy_OpenSARLab.md) +1. [Build and Deploy the Portal](dev-guides/portal/build_and_deploy_portal.md) +1. [Build and Deploy OpenSARLab Image](dev-guides/container/build_and_deploy_opensarlab_image.md) +1. [Build and Deploy OpenSARLab Cluster](dev-guides/cluster/build_and_deploy_opensarlab_cluster.md) 1. [System Diagram](assets/system_diagrams/OpenSARLab_system_diagram_June_2021.png) -1. [Conda Environment Options](dev-guides/conda_environments.md) -1. [OpenSARLab Notifications](dev-guides/notifications.md) +1. [Conda Environment Options](dev-guides/container/conda_environments.md) +1. [OpenSARLab Notifications](dev-guides/portal/notifications.md) 1. [Troubleshooting](dev-guides/troubleshooting.md) -1. [Custom Mintpy Conda Build Instructions](dev-guides/mintpy_conda.md) +1. [Custom Mintpy Conda Build Instructions](dev-guides/container/mintpy_conda.md) diff --git a/docs/release-notes/release_12-2024.md b/docs/release-notes/release_12-2024.md new file mode 100644 index 0000000..25a1ea4 --- /dev/null +++ b/docs/release-notes/release_12-2024.md @@ -0,0 +1,5 @@ +# Welcome to the February 2024 OpenScienceLab Update! + +### Changes: + +Documentation updated to reflect current code for container, cluster, and portal. diff --git a/mkdocs.yml b/mkdocs.yml index b1f1d34..a2649c5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -37,18 +37,18 @@ nav: - Logging Out and Server Shutdown: user-guides/logging_out_and_server_shutdown.md - Developer Guide: - OpenScienceLab: dev-guides/about_opensciencelab.md - - Build and Deploy the Portal: dev-guides/build_and_deploy_portal.md - - Build and Deploy OpenSARLab Image: dev-guides/build_and_deploy_opensarlab_image.md - - Build and Deploy OpenSARLab Cluster: dev-guides/build_and_deploy_opensarlab_cluster.md - - (DEPEC) Deploy OpenSARLab to AWS: dev-guides/deploy_OpenSARLab.md + - Build and Deploy the Portal: dev-guides/portal/build_and_deploy_portal.md + - Build and Deploy OpenSARLab Image: dev-guides/container/build_and_deploy_opensarlab_image.md + - Build and Deploy OpenSARLab Cluster: dev-guides/cluster/build_and_deploy_opensarlab_cluster.md - System Diagram: assets/system_diagrams/OpenSARLab_system_diagram_June_2021.png - - Conda Environment Options: dev-guides/conda_environments.md - - OpenSARLab Notifications: dev-guides/notifications.md + - Conda Environment Options: dev-guides/container/conda_environments.md + - OpenSARLab Notifications: dev-guides/portal/notifications.md - Troubleshooting: dev-guides/troubleshooting.md - - Custom Mintpy Conda Build Instructions: dev-guides/mintpy_conda.md + - Custom Mintpy Conda Build Instructions: dev-guides/container/mintpy_conda.md - Release Notes: - June 2021: release-notes/release_06-2021.md - October 2021: release-notes/release_10-2021.md - February 2022: release-notes/release_02-2022.md - February 2023: release-notes/release_02-2023.md - February 2024: release-notes/release_02-2024.md + - December 2024: release-notes/release_12_2024.md From f3c9f2fd8a4ec9f3854692ecc9f524c792bd9b75 Mon Sep 17 00:00:00 2001 From: Eric Lundell Date: Wed, 11 Dec 2024 15:17:03 -0900 Subject: [PATCH 7/7] Fix broken links --- docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md | 2 +- docs/release_notes.md | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md b/docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md index b2da125..b8ebc9d 100644 --- a/docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md +++ b/docs/dev-guides/cluster/build_and_deploy_opensarlab_cluster.md @@ -32,7 +32,7 @@ You will need the ARN of the certificate. -1. Update `opensciencelab.yaml` within the code. See explaination of the various parts [../opensciencelab_yaml.md](here). +1. Update `opensciencelab.yaml` within the code. See explaination of the various parts [here](../opensciencelab_yaml.md). 1. Deploy the CloudFormation template found at `pipeline/cf-setup-pipeline.yaml`. diff --git a/docs/release_notes.md b/docs/release_notes.md index aa5daee..681eef9 100644 --- a/docs/release_notes.md +++ b/docs/release_notes.md @@ -3,3 +3,4 @@ 1. [February 2022](release-notes/release_02-2022.md) 1. [February 2023](release-notes/release_02-2023.md) 1. [February 2024](release-notes/release_02-2024.md) +1. [December 2024](release-notes/release_12-2024.md)