-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup Cognito user pool and copy it to S3 #2
Conversation
And add babel-node as a dependency
…nd bin script all function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; @mjgiarlo I'll be happy to merge when this is a real PR 😃
Includes: * Add `CMD` to Docker configuration to reflect the container's entrypoint * Split export and copy functionality into separate classes * Isolate AWS configuration within its own module * Move "glue" code to `index.js` * Set default value for S3 bucket * Add docker-compose configuration to mimic how we will run and configure the container in Fargate
@@ -0,0 +1,13 @@ | |||
{ | |||
"plugins": [["babel-plugin-dotenv", { | |||
"replacedModuleName": "babel-dotenv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to make babel and dotenv play nicely together.
], | ||
"env": { | ||
"test": { | ||
"plugins": ["@babel/plugin-transform-runtime"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to use async
/await
in test suite.
@@ -0,0 +1,9 @@ | |||
import ExportUsers from './src/ExportUsers' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module is untested because it's using code that's already covered in our test suite.
"backup": "bin/backup" | ||
}, | ||
"scripts": { | ||
"ci": "AWS_PROFILE=foobarbazquux jest --colors --ci --coverage", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the env vars here either, but it works and I wasn't sure there was a better approach. Explained here: https://github.com/LD4P/sinopia_user_backup/pull/2/files#diff-5a61b7aef07cd51690e0d92206685355R13
src/CopyUsers.js
Outdated
AWS.config = configureAWS(AWS.config) | ||
this.userListString = JSON.stringify(userList) | ||
this.s3 = new AWS.S3() | ||
this.objectKey = `${new Date().toISOString()}/${config.get('userPoolId')}.json` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Store users in a JSON file named after the user pool in a subdirectory that is the datestamp of the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we want the directory named for the userPool, and the filename being the dataestamp? Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.
Also ... would we want to delete older files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we want the directory named for the userPool, and the filename being the dataestamp?
one possible argument for organizing by date before pool id is that it makes it easier to see things in chronological order by directory structure and file name, using lexical sorting in either tree
or ls
. which would make it easy to see at a glance where the latest backup file was, without having to remember what order the pool IDs were used in.
Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json
i do like the idea of a filename that's more descriptive than just a pool id or a timestamp, and i do like having files that sort well as siblings even when their parent directory structures get flattened. but i also like the idea of timestamp preceding pool id. so my personal pref would be something like DATE/user-backup_DATE_POOLID.json
, for e.g. a filename like 2019-05-01/user-backup_2019-05-01_234zzv9wer9ads8f.json
.
Also ... would we want to delete older files?
i'm not sure if this is captured in a ticket at the moment, but my understanding from slack discussion yesterday was that we'd punt on cleanup till later. i think @mjgiarlo's back of the envelope calculation expected < 25 MB of storage use per year if we don't purge anything, which seems pretty minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndushay 💬
wouldn't we want the directory named for the userPool, and the filename being the datestamp?
Perhaps. But AFAIK we only have one user pool per env, so the top-level user pool dir is only so valuable in an env that won't have multiple pools. I've just created a poll in Slack to see what folks want.
Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.
Hmmm, how do you think the users
prefix helps?
Also ... would we want to delete older files?
I defer to @jermnelson. Jeremy, how much do we care about cleaning up our user pool backups in M3 (or even in M4)? It's fairly cheap to keep the files hanging around, and we can always clean later. I'm inclined to keep that out of this PR, but I'm happy to write up a new issue if this is something we care about!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to delete the files at least for a couple of months into the project. To me this would be ongoing maintenance or part of the next work-cycle if we wanted to add a programmatic way to remove the old files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jermnelson OK. I'll write up a ticket and it'll sit in the backlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,12 @@ | |||
import AWS from 'aws-sdk' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function lives outside the other classes to reduce copypasta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One gen-uine question and a few quibbles.
src/CopyUsers.js
Outdated
AWS.config = configureAWS(AWS.config) | ||
this.userListString = JSON.stringify(userList) | ||
this.s3 = new AWS.S3() | ||
this.objectKey = `${new Date().toISOString()}/${config.get('userPoolId')}.json` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we want the directory named for the userPool, and the filename being the dataestamp? Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.
Also ... would we want to delete older files?
Includes:
npm init
and addjest
andeslint
Fixes #1