-
Notifications
You must be signed in to change notification settings - Fork 70
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #77 from miykael/move_user_docs
Moving user documentation from main homepage to nipype_tutorial - Pull Request
- Loading branch information
Showing
61 changed files
with
4,603 additions
and
851 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Using Nipype with Amazon Web Services (AWS)\n", | ||
"\n", | ||
"Several groups have been successfully using Nipype on AWS. This procedure\n", | ||
"involves setting a temporary cluster using StarCluster and potentially\n", | ||
"transferring files to/from S3. The latter is supported by Nipype through\n", | ||
"`DataSink` and `S3DataGrabber`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Using DataSink with S3\n", | ||
"\n", | ||
"The `DataSink` class now supports sending output data directly to an AWS S3\n", | ||
"bucket. It does this through the introduction of several input attributes to the\n", | ||
"`DataSink` interface and by parsing the `base_directory` attribute. This class\n", | ||
"uses the [boto3](https://boto3.readthedocs.org/en/latest/) and\n", | ||
"[botocore](https://botocore.readthedocs.org/en/latest/) Python packages to\n", | ||
"interact with AWS. To configure the `DataSink` to write data to S3, the user must\n", | ||
"set the ``base_directory`` property to an S3-style filepath.\n", | ||
"\n", | ||
"For example:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from nipype.interfaces.io import DataSink\n", | ||
"ds = DataSink()\n", | ||
"ds.inputs.base_directory = 's3://mybucket/path/to/output/dir'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"With the `\"s3://\"` prefix in the path, the `DataSink` knows that the output\n", | ||
"directory to send files is on S3 in the bucket `\"mybucket\"`. `\"path/to/output/dir\"`\n", | ||
"is the relative directory path within the bucket `\"mybucket\"` where output data\n", | ||
"will be uploaded to (***Note***: if the relative path specified contains folders that\n", | ||
"don’t exist in the bucket, the `DataSink` will create them). The `DataSink` treats\n", | ||
"the S3 base directory exactly as it would a local directory, maintaining support\n", | ||
"for containers, substitutions, subfolders, `\".\"` notation, etc. to route output\n", | ||
"data appropriately.\n", | ||
"\n", | ||
"There are four new attributes introduced with S3-compatibility: ``creds_path``,\n", | ||
"``encrypt_bucket_keys``, ``local_copy``, and ``bucket``." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ds.inputs.creds_path = '/home/neuro/aws_creds/credentials.csv'\n", | ||
"ds.inputs.encrypt_bucket_keys = True\n", | ||
"ds.local_copy = '/home/neuro/workflow_outputs/local_backup'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"``creds_path`` is a file path where the user's AWS credentials file (typically\n", | ||
"a csv) is stored. This credentials file should contain the AWS access key id and\n", | ||
"secret access key and should be formatted as one of the following (these formats\n", | ||
"are how Amazon provides the credentials file by default when first downloaded).\n", | ||
"\n", | ||
"Root-account user:\n", | ||
"\n", | ||
"\tAWSAccessKeyID=ABCDEFGHIJKLMNOP\n", | ||
"\tAWSSecretKey=zyx123wvu456/ABC890+gHiJk\n", | ||
"\n", | ||
"IAM-user:\n", | ||
"\n", | ||
"\tUser Name,Access Key Id,Secret Access Key\n", | ||
"\t\"username\",ABCDEFGHIJKLMNOP,zyx123wvu456/ABC890+gHiJk\n", | ||
"\n", | ||
"The ``creds_path`` is necessary when writing files to a bucket that has\n", | ||
"restricted access (almost no buckets are publicly writable). If ``creds_path``\n", | ||
"is not specified, the DataSink will check the ``AWS_ACCESS_KEY_ID`` and\n", | ||
"``AWS_SECRET_ACCESS_KEY`` environment variables and use those values for bucket\n", | ||
"access.\n", | ||
"\n", | ||
"``encrypt_bucket_keys`` is a boolean flag that indicates whether to encrypt the\n", | ||
"output data on S3, using server-side AES-256 encryption. This is useful if the\n", | ||
"data being output is sensitive and one desires an extra layer of security on the\n", | ||
"data. By default, this is turned off.\n", | ||
"\n", | ||
"``local_copy`` is a string of the filepath where local copies of the output data\n", | ||
"are stored in addition to those sent to S3. This is useful if one wants to keep\n", | ||
"a backup version of the data stored on their local computer. By default, this is\n", | ||
"turned off.\n", | ||
"\n", | ||
"``bucket`` is a boto3 Bucket object that the user can use to overwrite the\n", | ||
"bucket specified in their ``base_directory``. This can be useful if one has to\n", | ||
"manually create a bucket instance on their own using special credentials (or\n", | ||
"using a mock server like [fakes3](https://github.com/jubos/fake-s3)). This is\n", | ||
"typically used for developers unit-testing the DataSink class. Most users do not\n", | ||
"need to use this attribute for actual workflows. This is an optional argument.\n", | ||
"\n", | ||
"Finally, the user needs only to specify the input attributes for any incoming\n", | ||
"data to the node, and the outputs will be written to their S3 bucket." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"```python\n", | ||
"workflow.connect(inputnode, 'subject_id', ds, 'container')\n", | ||
"workflow.connect(realigner, 'realigned_files', ds, 'motion')\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"So, for example, outputs for `sub001`’s `realigned_file1.nii.gz` will be in:\n", | ||
"\n", | ||
" s3://mybucket/path/to/output/dir/sub001/motion/realigned_file1.nii.gz" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Using S3DataGrabber\n", | ||
"Coming soon..." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python [default]", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.