Skip to content

Commit

Permalink
Added dev example and an AG deploy
Browse files Browse the repository at this point in the history
Fixed permissions throughout (by making over broad!)
  • Loading branch information
andrewpatto committed May 14, 2024
1 parent f0da9f6 commit 452185c
Show file tree
Hide file tree
Showing 11 changed files with 142 additions and 20 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ node_modules/
cdk.context.json

cdk.out/


.DS_Store
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,16 @@ test invocations.
{
"sourceFilesCsvBucket": "bucket-with-csv",
"sourceFilesCsvKey": "key-of-source-files.csv",
"destinationBucket": "a-target-bucket-in-same-region",
"destinationBucket": "a-target-bucket-in-same-region-but-not-same-account",
"maxItemsPerBatch": 10
}
```

The copy will fan out wide (to sensible width (~ 100)) - but there is a small cost to the startup/shutdown
of the Fargate tasks. The maxItemsPerBatch controls how many individuals files are attempted per
Fargate task - though noting that we request SPOT tasks.

So there is balance between the likelihood of SPOT interruptions v re-use of Fargate tasks. If
tasks are SPOT interrupted - then the next invocation will skip already transferred files (assuming
at least one is copied) - so it is probably safe and cheapest to leave the items per batch at 10
and be prepared to perhaps re-execute the copy.
18 changes: 18 additions & 0 deletions dev/EXAMPLE-COPY-README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
How to do a full scale invoke test.

Go to "elsa-data-tmp" bucket in dev.
It probably will be empty as objects auto-expire.
Make a folder "copy-out-test-working".
Copy "example-copy-manifest.csv" to that folder.

THE FOLDER MUST BE EXACTLY AS SPECIFIED AS THAT PERMISSION IS BAKED INTO
THE DEV DEPLOYMENT (IN ORDER TO TEST PERMISSIONS!)

Invoke the dev Steps with the input

{
"sourceFilesCsvBucket": "elsa-data-tmp",
"sourceFilesCsvKey": "example-copy-manifest.csv",
"destinationBucket": "elsa-data-copy-target-sydney",
"maxItemsPerBatch": 2
}
40 changes: 39 additions & 1 deletion dev/dev.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { CopyOutStateMachineConstruct } from "aws-copy-out-sharer";
import { SubnetType } from "aws-cdk-lib/aws-ec2";
import { SubnetType, Vpc } from "aws-cdk-lib/aws-ec2";
import { App, Stack, StackProps } from "aws-cdk-lib";
import { InfrastructureClient } from "@elsa-data/aws-infrastructure";
import { Service } from "aws-cdk-lib/aws-servicediscovery";
Expand All @@ -12,6 +12,7 @@ const description =
"Bulk copy-out service for Elsa Data - an application for controlled genomic data sharing";

const devId = "ElsaDataDevCopyOutStack";
const agId = "ElsaDataAgCopyOutStack";

/**
* Wraps the copy out construct for development purposes. We don't do this Stack definition in the
Expand Down Expand Up @@ -70,3 +71,40 @@ new ElsaDataCopyOutStack(app, devId, {
},
description: description,
});

/**
* Wraps an even simpler deployment direct for AG. We have needs to do AG copies
* outside of Elsa. This is also a good test of the copy-out mechanics. So this
* allows us to directly deploy/destroy.
*/
class ElsaDataSimpleCopyOutStack extends Stack {
constructor(scope?: Construct, id?: string, props?: StackProps) {
super(scope, id, props);

const vpc = Vpc.fromLookup(this, "Vpc", { vpcName: "main-vpc" });

const copyOut = new CopyOutStateMachineConstruct(this, "CopyOut", {
vpc: vpc,
vpcSubnetSelection: SubnetType.PRIVATE_WITH_EGRESS,
workingBucket: "elsa-data-copy-working",
workingBucketPrefixKey: "temp/",
aggressiveTimes: false,
allowWriteToInstalledAccount: true,
});

//stateMachineArn: copyOut.stateMachine.stateMachineArn,
}
}

new ElsaDataSimpleCopyOutStack(app, agId, {
// the stack can only be deployed to 'dev'
env: {
account: "602836945884",
region: "ap-southeast-2",
},
tags: {
"umccr-org:Product": "ElsaData",
"umccr-org:Stack": agId,
},
description: description,
});
12 changes: 12 additions & 0 deletions dev/example-copy-manifest.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
umccr-10f-data-dev,"ASHKENAZIM/HG002-HG003-HG004.joint.filter.bcf"
umccr-10f-data-dev,"ASHKENAZIM/HG002-HG003-HG004.joint.filter.bcf.csi"
umccr-10f-data-dev,"ASHKENAZIM/HG002-HG003-HG004.joint.filter.vcf"
umccr-10f-data-dev,"ASHKENAZIM/HG002-HG003-HG004.joint.filter.vcf.gz"
umccr-10f-data-dev,"ASHKENAZIM/HG002-HG003-HG004.joint.filter.vcf.gz.csi"
umccr-10f-data-dev,"ASHKENAZIM/HG002-HG003-HG004.joint.filter.vcf.gz.tbi"
umccr-10f-data-dev,ASHKENAZIM/HG002.bam
umccr-10f-data-dev,ASHKENAZIM/HG002.bam.bai
umccr-10f-data-dev,ASHKENAZIM/HG003.bam
umccr-10f-data-dev,ASHKENAZIM/HG003.bam.bai
umccr-10f-data-dev,ASHKENAZIM/HG004.bam
umccr-10f-data-dev,ASHKENAZIM/HG004.bam.bai
6 changes: 4 additions & 2 deletions dev/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@
"version": "0.0.0",
"description": "Manual CDK deployment for development",
"scripts": {
"deploy": "pnpm -w run build && cdk deploy",
"destroy": "pnpm -w run build && cdk destroy",
"deploy": "pnpm -w run build && cdk deploy ElsaDataDevCopyOutStack",
"destroy": "pnpm -w run build && cdk destroy ElsaDataDevCopyOutStack",
"agdeploy": "pnpm -w run build && cdk deploy ElsaDataAgCopyOutStack",
"agdestroy": "pnpm -w run build && cdk destroy ElsaDataAgCopyOutStack",
"test": "ts-node --prefer-ts-exts test.ts",
"test-quick": "ts-node --prefer-ts-exts test.ts"
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,14 @@ export async function handler(event: InvokeEvent) {
// the manifest.json is generated by an AWS Steps DISTRIBUTED map and shows the results
// of all the individual map run parts
const getManifestCommand = new GetObjectCommand({
Bucket: event.destinationBucket,
Bucket: (event as any).sourceFilesCsvBucket,
Key: event.rcloneResults.manifestAbsoluteKey,
});

const getManifestResult = await client.send(getManifestCommand);

const getManifestContent = await getManifestResult.Body.transformToString();

// A sample manifest
// {"DestinationBucket":"elsa-data-tmp",
// "MapRunArn":"arn:aws:states:ap-southeast-2:12345678:mapRun:CopyOutStateMachineABCD/4474d22f-4056-30e3-978c-027016edac90:0c17ffd6-e8ad-44c0-a65b-a8b721007241",
Expand All @@ -48,6 +50,8 @@ export async function handler(event: InvokeEvent) {

const manifest = JSON.parse(getManifestContent);

console.debug(JSON.stringify(manifest, null, 2));

const rf = manifest["ResultFiles"];

if (!rf)
Expand Down Expand Up @@ -80,7 +84,7 @@ export async function handler(event: InvokeEvent) {

for (const s of succeeded) {
const getSuccessCommand = new GetObjectCommand({
Bucket: event.destinationBucket,
Bucket: (event as any).sourceFilesCsvBucket,
Key: s["Key"],
});

Expand Down Expand Up @@ -123,8 +127,8 @@ export async function handler(event: InvokeEvent) {
// looking
const errors: number = rcloneRow["errors"];
const lastError: number = rcloneRow["lastError"];
const copiedBytes: number = rcloneRow["serverSideCopyBytes"];
const copySeconds = rcloneRow["elapsedTime"];
const serverSideCopyBytes: number = rcloneRow["serverSideCopyBytes"];
const elapsedTime = rcloneRow["elapsedTime"];
const totalTransfers = rcloneRow["totalTransfers"];
const retryError = rcloneRow["retryError"];

Expand Down Expand Up @@ -160,11 +164,13 @@ export async function handler(event: InvokeEvent) {
};
} else {
// if we did do a copy then copySeconds will normally be a value and we can compute a speed
if (copySeconds)
if (elapsedTime)
fileResults[b] = {
name: b,
status: "COPIED",
speed: Math.floor(copiedBytes / copySeconds / 1024 / 1024),
speed: Math.floor(
serverSideCopyBytes / elapsedTime / 1024 / 1024,
),
message: "",
};
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,9 +165,10 @@ export class CopyOutStateMachineConstruct extends Construct {
"s3:AbortMultipartUpload",
],
resources: [
`arn:aws:s3:::${props.workingBucket}/${
props.workingBucketPrefixKey ?? ""
}*`,
"*",
//`arn:aws:s3:::${props.workingBucket}/${
// props.workingBucketPrefixKey ?? ""
//}*`,
],
}),
);
Expand Down
23 changes: 23 additions & 0 deletions packages/aws-copy-out-sharer/src/rclone-run-task-construct.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,25 @@ export class RcloneRunTaskConstruct extends Construct {
}),
);

/*
import { LinuxParameters } from "aws-cdk-lib/aws-ecs";
const linux = new LinuxParameters(this, "Linux", {
});
linux.addTmpfs(
{
"mountOptions": [ TmpfsMountOption.RW ],
"containerPath": "/run",
"size": 10
},
{
"mountOptions": [ TmpfsMountOption.RW],
"containerPath": "/tmp",
"size": 10
}
); */

const containerDefinition = taskDefinition.addContainer("RcloneContainer", {
// set the stop timeout to the maximum allowed under Fargate Spot
// potentially this will let us finish our rclone operation (!!! - we don't actually try to let rclone finish - see Docker image - we should)
Expand All @@ -84,6 +103,10 @@ export class RcloneRunTaskConstruct extends Construct {
platform: Platform.LINUX_AMD64,
},
),
readonlyRootFilesystem: true,
// https://stackoverflow.com/questions/68933848/how-to-allow-container-with-read-only-root-filesystem-writing-to-tmpfs-volume
// DOESN'T WORK FOR FARGATE SO NEED TO THINK ABOUT THIS OTHER WAY
// linuxParameters: linux,
logging: LogDriver.awsLogs({
streamPrefix: "elsa-data-copy-out",
logRetention: RetentionDays.ONE_WEEK,
Expand Down
11 changes: 10 additions & 1 deletion packages/aws-copy-out-sharer/src/s3-csv-distributed-map.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,12 @@ import {
StateGraph,
StateMachine,
} from "aws-cdk-lib/aws-stepfunctions";
import { Effect, Policy, PolicyStatement } from "aws-cdk-lib/aws-iam";
import {
Effect,
ManagedPolicy,
Policy,
PolicyStatement,
} from "aws-cdk-lib/aws-iam";
import { Construct } from "constructs";

export interface S3CsvDistributedMapProps {
Expand Down Expand Up @@ -91,6 +96,10 @@ export class S3CsvDistributedMap
);

this.policy.attachToRole(stateMachine.role);

stateMachine.role.addManagedPolicy(
ManagedPolicy.fromAwsManagedPolicyName("AmazonS3FullAccess"),
);
}

protected makeNext(next: State) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,13 @@ export class SummariseCopyLambdaStepConstruct extends Construct {
summariseCopyLambda.addToRolePolicy(
new PolicyStatement({
effect: Effect.ALLOW,
actions: ["s3:GetObject"],
resources: [
`arn:aws:s3:::${props.workingBucket}/${
props.workingBucketPrefixKey ?? ""
}*`,
],
actions: ["s3:*"],
resources: ["*"],
//[
//`arn:aws:s3:::${props.workingBucket}/${
// props.workingBucketPrefixKey ?? ""
//}*`,
//],
}),
);

Expand Down

0 comments on commit 452185c

Please sign in to comment.