Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup Cognito user pool and copy it to S3 #2

Merged
merged 18 commits into from
May 3, 2019
Merged

Conversation

mjgiarlo
Copy link
Member

@mjgiarlo mjgiarlo commented May 1, 2019

Includes:

  • Run npm init and add jest and eslint
  • Set up CircleCI, CodeClimate, jest, and eslint and update the README
  • Dockerize app and correct typo in README
  • Add bin/backup stub along with supporting source stub and add babel-node as a dependency
  • Add config, dotenv, and babel packages and make sure linter, tests, and bin script all function
  • Export user data from Cognito
  • Copy user data to S3
  • 💯 test coverage

Fixes #1

@mjgiarlo mjgiarlo added M3 Milestone 3 (for filtering github project board) needs review labels May 1, 2019
Copy link
Contributor

@jermnelson jermnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; @mjgiarlo I'll be happy to merge when this is a real PR 😃

mjgiarlo added 4 commits May 2, 2019 09:41
Includes:

* Add `CMD` to Docker configuration to reflect the container's entrypoint
* Split export and copy functionality into separate classes
* Isolate AWS configuration within its own module
* Move "glue" code to `index.js`
* Set default value for S3 bucket
* Add docker-compose configuration to mimic how we will run and configure the container in Fargate
@mjgiarlo mjgiarlo marked this pull request as ready for review May 2, 2019 21:52
@@ -0,0 +1,13 @@
{
"plugins": [["babel-plugin-dotenv", {
"replacedModuleName": "babel-dotenv"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to make babel and dotenv play nicely together.

],
"env": {
"test": {
"plugins": ["@babel/plugin-transform-runtime"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to use async/await in test suite.

@@ -0,0 +1,9 @@
import ExportUsers from './src/ExportUsers'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is untested because it's using code that's already covered in our test suite.

"backup": "bin/backup"
},
"scripts": {
"ci": "AWS_PROFILE=foobarbazquux jest --colors --ci --coverage",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the env vars here either, but it works and I wasn't sure there was a better approach. Explained here: https://github.com/LD4P/sinopia_user_backup/pull/2/files#diff-5a61b7aef07cd51690e0d92206685355R13

src/CopyUsers.js Outdated
AWS.config = configureAWS(AWS.config)
this.userListString = JSON.stringify(userList)
this.s3 = new AWS.S3()
this.objectKey = `${new Date().toISOString()}/${config.get('userPoolId')}.json`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store users in a JSON file named after the user pool in a subdirectory that is the datestamp of the operation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we want the directory named for the userPool, and the filename being the dataestamp? Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.

Also ... would we want to delete older files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we want the directory named for the userPool, and the filename being the dataestamp?

one possible argument for organizing by date before pool id is that it makes it easier to see things in chronological order by directory structure and file name, using lexical sorting in either tree or ls. which would make it easy to see at a glance where the latest backup file was, without having to remember what order the pool IDs were used in.

Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json

i do like the idea of a filename that's more descriptive than just a pool id or a timestamp, and i do like having files that sort well as siblings even when their parent directory structures get flattened. but i also like the idea of timestamp preceding pool id. so my personal pref would be something like DATE/user-backup_DATE_POOLID.json, for e.g. a filename like 2019-05-01/user-backup_2019-05-01_234zzv9wer9ads8f.json.

Also ... would we want to delete older files?

i'm not sure if this is captured in a ticket at the moment, but my understanding from slack discussion yesterday was that we'd punt on cleanup till later. i think @mjgiarlo's back of the envelope calculation expected < 25 MB of storage use per year if we don't purge anything, which seems pretty minimal.

Copy link
Member Author

@mjgiarlo mjgiarlo May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndushay 💬

wouldn't we want the directory named for the userPool, and the filename being the datestamp?

Perhaps. But AFAIK we only have one user pool per env, so the top-level user pool dir is only so valuable in an env that won't have multiple pools. I've just created a poll in Slack to see what folks want.

Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.

Hmmm, how do you think the users prefix helps?

Also ... would we want to delete older files?

I defer to @jermnelson. Jeremy, how much do we care about cleaning up our user pool backups in M3 (or even in M4)? It's fairly cheap to keep the files hanging around, and we can always clean later. I'm inclined to keep that out of this PR, but I'm happy to write up a new issue if this is something we care about!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to delete the files at least for a couple of months into the project. To me this would be ongoing maintenance or part of the next work-cycle if we wanted to add a programmatic way to remove the old files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jermnelson OK. I'll write up a ticket and it'll sit in the backlog.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3

@@ -0,0 +1,12 @@
import AWS from 'aws-sdk'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function lives outside the other classes to reduce copypasta.

Copy link

@ndushay ndushay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One gen-uine question and a few quibbles.

__mocks__/cognito-fake.js Show resolved Hide resolved
__mocks__/cognito-pagination-fake.js Show resolved Hide resolved
__tests__/CopyUsers.test.js Show resolved Hide resolved
src/CopyUsers.js Outdated
AWS.config = configureAWS(AWS.config)
this.userListString = JSON.stringify(userList)
this.s3 = new AWS.S3()
this.objectKey = `${new Date().toISOString()}/${config.get('userPoolId')}.json`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we want the directory named for the userPool, and the filename being the dataestamp? Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.

Also ... would we want to delete older files?

@mjgiarlo mjgiarlo merged commit 5555f8b into master May 3, 2019
@mjgiarlo mjgiarlo deleted the initial_setup branch May 3, 2019 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M3 Milestone 3 (for filtering github project board) needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

how to preserve user list across cognito user pool rebuilds
4 participants