Skip to content

Commit

Permalink
Merge pull request #3725 from LBNL-UCB-STI/do/#3652-execute-beam-on-g…
Browse files Browse the repository at this point in the history
…oogle-cloud-compute

Execute BEAM on Google Cloud
  • Loading branch information
dimaopen authored Feb 21, 2023
2 parents 7db5894 + efa82cf commit b028f62
Show file tree
Hide file tree
Showing 11 changed files with 776 additions and 7 deletions.
6 changes: 5 additions & 1 deletion aws/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ lambda {
socketTimeout = 900000
}

task deploy(type: AWSLambdaInvokeTask) {
tasks.register('deployToEC2', AWSLambdaInvokeTask) {
doFirst {
def propsFileName = "../gradle.deploy.properties"
if (project.hasProperty('propsFile')) {
Expand Down Expand Up @@ -355,4 +355,8 @@ def getGitResultFromWorkingDirUsing(command, defaultResult) {
} catch (ignored) {
}
return gitResult
}

ext {
getCurrentGitUserEmail = this.&getCurrentGitUserEmail
}
31 changes: 31 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -738,3 +738,34 @@ jmh {
duplicateClassesStrategy = 'exclude'
zip64 = true
}

tasks.register('deploy') {
def cloudPlatform = null
def paramName = "cloudPlatform"
if (project.hasProperty(paramName)) {
cloudPlatform = project.findProperty(paramName)
} else {
def propsFileName = "./gradle.deploy.properties"
if (project.hasProperty('propsFile')) {
propsFileName = project.findProperty('propsFile')
}
def propsFile = new Properties()
propsFile.load(project.file(propsFileName).newDataInputStream())
cloudPlatform = propsFile.getProperty(paramName)
}
if (cloudPlatform == null) {
cloudPlatform = ""
}

switch (cloudPlatform.trim().toLowerCase()) {
case "amazon":
dependsOn ':aws:deployToEC2'
break
case "google":
dependsOn ':gcp:deployToGCE'
break
default:
throw new InvalidUserDataException("Cannot deploy! Please specify cloudPlatform property to one of [Google, Amazon]")
break
}
}
54 changes: 50 additions & 4 deletions docs/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ To run a BEAM simulation or experiment on amazon ec2, use following command with
The command will start an ec2 instance based on the provided configurations and run all simulations in serial. At the end of each simulation/experiment, outputs are uploaded to a public Amazon S3 bucket_. The default behavior is to run each simulation/experiment parallel on separate instances. For customized runs, you can also use following parameters that can be specified from command line:

* **propsFile**: to specify file with default values
* **cloudPlatform**: Amazon
* **runName**: to specify instance name.
* **beamBranch**: To specify the branch for simulation, current source branch will be used as default branch.
* **beamCommit**: The commit SHA to run simulation. use `HEAD` if you want to run with latest commit, default is `HEAD`.
Expand All @@ -211,9 +212,9 @@ The command will start an ec2 instance based on the provided configurations and
* **beamExperiments**: A comma `,` separated list of `experiment.yml` files. It should be relative path under the project home.You can create branch level defaults same as configs by specifying the branch name with `.experiments` suffix like `master.experiments`. Branch level default will be used if `beamExperiments` is not present. `beamConfigs` has priority over this, in other words, if both are provided then `beamConfigs` will be used.
* **executeClass** and **executeArgs**: to specify class and args to execute if `execute` was chosen as deploy mode
* **maxRAM**: to specify MAXRAM environment variable for simulation.
* **storageSize**: to specfy storage size of instance. May be from `64` to `256`.
* **storageSize**: to specify storage size of instance. May be from `64` to `256`.
* **s3Backup**: to specify if copying results to s3 bucket is needed, default is `true`.
* **instanceType**: to specify s2 instance type.
* **instanceType**: to specify EC2 instance type.
* **region**: Use this parameter to select the AWS region for the run, all instances would be created in specified region. Default `region` is `us-east-2`.
* **shutdownWait**: As simulation ends, ec2 instance would automatically terminate. In case you want to use the instance, please specify the wait in minutes, default wait is 30 min.
* **shutdownBehaviour**: to specify shutdown behaviour after and of simulation. May be `stop` or `terminate`, default is `terminate`.
Expand All @@ -231,11 +232,11 @@ The order which will be used to look for parameter values is follow:

To run a batch simulation, you can specify multiple configuration files separated by commas::

./gradlew deploy -PbeamConfigs=test/input/beamville/beam.conf,test/input/sf-light/sf-light.conf
./gradlew deploy -PcloudPlatform=Amazon -PbeamConfigs=test/input/beamville/beam.conf,test/input/sf-light/sf-light.conf

Similarly for experiment batch, you can specify comma-separated experiment files::

./gradlew deploy -PbeamExperiments=test/input/beamville/calibration/transport-cost/experiments.yml,test/input/sf-light/calibration/transport-cost/experiments.yml
./gradlew deploy -PcloudPlatform=Amazon -PbeamExperiments=test/input/beamville/calibration/transport-cost/experiments.yml,test/input/sf-light/calibration/transport-cost/experiments.yml

For demo and presentation material, please follow the link_ on google drive.

Expand All @@ -258,6 +259,51 @@ You need to define the deploy properties that are similar to the ones for AWS de

Your task is going to be added to the queue and when it starts/finishes you receive a notification on your git user email. It may take 1-24 hours (or even more) for the task to get started. It depends on the NERSC workload. In your user home directory on NERSC you can find the output file of your task that looks like `slurm-<job id>.out`. The BEAM output directory is resides at `$SCRATCH/beam_runs/`. Also the output is uploaded to s3 if `s3Backup` is set to true.

BEAM run on Google Compute Engine
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to run BEAM on GCE one needs to have `cloudfunctions.functions.invoke` permission on function `projects/beam-core/locations/us-central1/functions/deploy_beam`.

One needs to install `glcoud <https://cloud.google.com/sdk/docs/install>`_ utility in order to be able to authenticate themself against the Google Cloud Platform.

The project id is `beam-core`. One can set it using::

gcloud config set project beam-core

There are `some ways to provide credentials <https://cloud.google.com/docs/authentication/provide-credentials-adc>`_. One option is just run the following command::

gcloud auth application-default login

Now all the instance are created in `us-central1-a` zone.
Now the deployment script doesn't calculate it automatically.
One needs to define the deploy properties that are similar to the ones for AWS deploy. These are the properties that is used on GCE:

* **propsFile**: to specify file with default values
* **cloudPlatform**: Google
* **runName**: to specify instance name.
* **beamBranch**: To specify the branch for simulation, current source branch will be used as default branch.
* **beamCommit**: The commit SHA to run simulation. Comment it out if you want to run with latest commit.
* **dataBranch**: To specify the branch for production data, 'develop' branch will be used as default branch.
* **dataCommit**: The commit SHA for the the data branch, default is `HEAD`
* **beamConfigs**: The `beam.conf` file. It should be relative path under the project home. A single file is supported right now.
* **shutdownWait**: As simulation ends, ec2 instance would automatically terminate. In case you want to use the instance, please specify the wait in minutes, default wait is 15 min.
* **shutdownBehaviour**: to specify shutdown behaviour after and of simulation. May be `stop` or `terminate`, default is `terminate`.
* **instanceType**: To specify GCE instance type.
* **forcedMaxRAM**: This parameter can be set according to the **instanceType** memory size.
* **storageSize**: to specify storage size (Gb) of instance. May be from `100` to `256`. Default value is `100`.

The deployment command is

.. code-block:: bash
./gradlew -PcloudPlatform=Google deploy
The simulation output is uploaded to the `Google Cloud Storage <https://console.cloud.google.com/storage/browser/beam-core-outputs/output>`_.

In order to ssh to the running instance one could start the following command::

gcloud compute ssh --zone=us-central1-a clu@<instance_name>


PILATES run on EC2
~~~~~~~~~~~~~~~~~~
Expand Down
88 changes: 88 additions & 0 deletions gcp/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import com.google.api.client.http.GenericUrl;
import com.google.api.client.http.HttpRequest;
import com.google.api.client.http.HttpResponse;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.HttpContent;
import com.google.api.client.http.ByteArrayContent;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.auth.http.HttpCredentialsAdapter;
import com.google.auth.oauth2.GoogleCredentials;
import com.google.auth.oauth2.IdTokenCredentials;
import com.google.auth.oauth2.IdTokenProvider;

apply from: "$rootDir/aws/build.gradle"

group = 'beam'
version = '0.8.0'

buildscript {
repositories {
mavenLocal()
mavenCentral()
maven { url "https://plugins.gradle.org/m2/" }
gradlePluginPortal()
}
dependencies {
classpath "com.google.auth:google-auth-library-oauth2-http:1.3.0"
}
}

tasks.register('deployToGCE') {
doLast {
def propsFileName = "../gradle.deploy.properties"
if (project.hasProperty('propsFile')) {
propsFileName = project.findProperty('propsFile')
}

def propsFile = new Properties()
propsFile.load(project.file(propsFileName).newDataInputStream())

ext.getParameterValue = { paramName ->
if (project.hasProperty(paramName)) {
return project.findProperty(paramName)
} else {
return propsFile.getProperty(paramName)
}
}

if (!ext.getParameterValue('runName')) {
throw new GradleException('Please name the run by specifying `runName` argument. e.g; ./gradlew deploy -PrunName=sfbay-performance-run')
}
def tempInstanceType = "${ext.getParameterValue('instanceType') ?: (project.hasProperty('defaultInstanceType') ? defaultInstanceType : '')}"
def finalInstanceType = tempInstanceType.isEmpty() ? null : tempInstanceType
GString pload = """{
"run_name": "${ext.getParameterValue('runName') + '_' + getCurrentGitUserEmail()}",
"instance_type": "${finalInstanceType}",
"forced_max_ram": "${ext.getParameterValue('forcedMaxRAM')}",
"beam_branch": "${ext.getParameterValue('beamBranch') ?: getCurrentGitBranch()}",
"beam_commit": "${ext.getParameterValue('beamCommit') ?: 'HEAD'}",
"data_branch": "${ext.getParameterValue('dataBranch') ?: 'develop'}",
"data_commit": "${ext.getParameterValue('dataCommit') ?: 'HEAD'}",
"shutdown_wait": "${ext.getParameterValue('shutdownWait')}",
"storage_size": "${ext.getParameterValue('storageSize')}",
"shutdown_behaviour": "${ext.getParameterValue('shutdownBehaviour')}",
"storage_publish": ${"false".equalsIgnoreCase(ext.getParameterValue('s3Backup')) ? "false" : "true"},
"config": "${ext.getParameterValue('beamConfigs')}"
}"""
logger.warn(pload)
HttpResponse result = makeJsonPostRequest("https://us-central1-beam-core.cloudfunctions.net/deploy_beam", pload)
logger.warn("response status: ${result.statusCode}, response message: ${result.statusMessage}, payload: ${result.content}")
}
}

static HttpResponse makeJsonPostRequest(String functionUrl, String requestBody) {
GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();
IdTokenCredentials tokenCredential =
IdTokenCredentials.newBuilder()
.setIdTokenProvider((IdTokenProvider) credentials)
.setTargetAudience(functionUrl)
.build();

GenericUrl genericUrl = new GenericUrl(functionUrl);
HttpCredentialsAdapter adapter = new HttpCredentialsAdapter(tokenCredential);
HttpTransport transport = new NetHttpTransport();

HttpContent requestContent = ByteArrayContent.fromString("application/json", requestBody)
HttpRequest request = transport.createRequestFactory(adapter).buildPostRequest(genericUrl, requestContent);
return request.execute();
}
Loading

0 comments on commit b028f62

Please sign in to comment.