Backup Cognito user pool and copy it to S3 #2

mjgiarlo · 2019-05-01T22:11:59Z

Includes:

Run npm init and add jest and eslint
Set up CircleCI, CodeClimate, jest, and eslint and update the README
Dockerize app and correct typo in README
Add bin/backup stub along with supporting source stub and add babel-node as a dependency
Add config, dotenv, and babel packages and make sure linter, tests, and bin script all function
Export user data from Cognito
Copy user data to S3
💯 test coverage

Fixes #1

And add babel-node as a dependency

…nd bin script all function

Fixes #1

jermnelson

LGTM; @mjgiarlo I'll be happy to merge when this is a real PR 😃

Includes: * Add `CMD` to Docker configuration to reflect the container's entrypoint * Split export and copy functionality into separate classes * Isolate AWS configuration within its own module * Move "glue" code to `index.js` * Set default value for S3 bucket * Add docker-compose configuration to mimic how we will run and configure the container in Fargate

mjgiarlo · 2019-05-02T22:01:06Z

.babelrc

@@ -0,0 +1,13 @@
+{
+  "plugins": [["babel-plugin-dotenv", {
+    "replacedModuleName": "babel-dotenv"


Needed to make babel and dotenv play nicely together.

mjgiarlo · 2019-05-02T22:01:23Z

.babelrc

+  ],
+  "env": {
+    "test": {
+      "plugins": ["@babel/plugin-transform-runtime"]


Needed to use async/await in test suite.

__mocks__/cognito-fake.js

__mocks__/cognito-pagination-fake.js

__tests__/CopyUsers.test.js

mjgiarlo · 2019-05-02T22:05:45Z

index.js

@@ -0,0 +1,9 @@
+import ExportUsers from './src/ExportUsers'


This module is untested because it's using code that's already covered in our test suite.

mjgiarlo · 2019-05-02T22:06:53Z

package.json

+    "backup": "bin/backup"
+  },
+  "scripts": {
+    "ci": "AWS_PROFILE=foobarbazquux jest --colors --ci --coverage",


I don't love the env vars here either, but it works and I wasn't sure there was a better approach. Explained here: https://github.com/LD4P/sinopia_user_backup/pull/2/files#diff-5a61b7aef07cd51690e0d92206685355R13

mjgiarlo · 2019-05-02T22:09:44Z

src/CopyUsers.js

+    AWS.config = configureAWS(AWS.config)
+    this.userListString = JSON.stringify(userList)
+    this.s3 = new AWS.S3()
+    this.objectKey = `${new Date().toISOString()}/${config.get('userPoolId')}.json`


Store users in a JSON file named after the user pool in a subdirectory that is the datestamp of the operation.

wouldn't we want the directory named for the userPool, and the filename being the dataestamp? Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.

Also ... would we want to delete older files?

wouldn't we want the directory named for the userPool, and the filename being the dataestamp?

one possible argument for organizing by date before pool id is that it makes it easier to see things in chronological order by directory structure and file name, using lexical sorting in either tree or ls. which would make it easy to see at a glance where the latest backup file was, without having to remember what order the pool IDs were used in.

Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json

i do like the idea of a filename that's more descriptive than just a pool id or a timestamp, and i do like having files that sort well as siblings even when their parent directory structures get flattened. but i also like the idea of timestamp preceding pool id. so my personal pref would be something like DATE/user-backup_DATE_POOLID.json, for e.g. a filename like 2019-05-01/user-backup_2019-05-01_234zzv9wer9ads8f.json.

Also ... would we want to delete older files?

i'm not sure if this is captured in a ticket at the moment, but my understanding from slack discussion yesterday was that we'd punt on cleanup till later. i think @mjgiarlo's back of the envelope calculation expected < 25 MB of storage use per year if we don't purge anything, which seems pretty minimal.

@ndushay 💬

wouldn't we want the directory named for the userPool, and the filename being the datestamp?

Perhaps. But AFAIK we only have one user pool per env, so the top-level user pool dir is only so valuable in an env that won't have multiple pools. I've just created a poll in Slack to see what folks want.

Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.

Hmmm, how do you think the users prefix helps?

Also ... would we want to delete older files?

I defer to @jermnelson. Jeremy, how much do we care about cleaning up our user pool backups in M3 (or even in M4)? It's fairly cheap to keep the files hanging around, and we can always clean later. I'm inclined to keep that out of this PR, but I'm happy to write up a new issue if this is something we care about!

I don't think we need to delete the files at least for a couple of months into the project. To me this would be ongoing maintenance or part of the next work-cycle if we wanted to add a programmatic way to remove the old files.

@jermnelson OK. I'll write up a ticket and it'll sit in the backlog.

mjgiarlo · 2019-05-02T22:10:38Z

src/configureAWS.js

@@ -0,0 +1,12 @@
+import AWS from 'aws-sdk'


This function lives outside the other classes to reduce copypasta.

ndushay

One gen-uine question and a few quibbles.

__mocks__/cognito-fake.js

__mocks__/cognito-pagination-fake.js

__tests__/CopyUsers.test.js

ndushay · 2019-05-02T22:41:01Z

src/CopyUsers.js

+    AWS.config = configureAWS(AWS.config)
+    this.userListString = JSON.stringify(userList)
+    this.s3 = new AWS.S3()
+    this.objectKey = `${new Date().toISOString()}/${config.get('userPoolId')}.json`


wouldn't we want the directory named for the userPool, and the filename being the dataestamp? Possibly even preceded by "users", so 234zzv9wer9ads8f/usersTIMESTAMP.json ? then I'd have all the files for the same userpool together and just look for the last one.

Also ... would we want to delete older files?

mjgiarlo added 6 commits May 1, 2019 13:01

Ran npm init then added jest and eslint

b465dfb

Set up CircleCI, CodeClimate, jest, and eslint and update the README

8c9a688

Dockerize app and correct typo in README

746a124

Add bin/backup stub along with supporting source stub

3903dd3

And add babel-node as a dependency

Add config, dotenv, and babel packages and make sure linter, tests, a…

e83502c

…nd bin script all function

Backup Cognito user pool and copy it to S3

a07e8e7

Fixes #1

mjgiarlo added M3 Milestone 3 (for filtering github project board) needs review labels May 1, 2019

mjgiarlo added this to the M03: User Login and “Profile Opening Night” milestone May 1, 2019

mjgiarlo mentioned this pull request May 1, 2019

Backup Cognito user pool data LD4P/sinopia_acl#70

Closed

Add a stub test

0258213

jermnelson approved these changes May 1, 2019

View reviewed changes

mjgiarlo added 4 commits May 2, 2019 09:41

Update test stubs and rename S3 bucket argument to be more accurate

6441a52

Update badge URLs in README

9f43853

Add test coverage to CI build

58c7507

mjgiarlo force-pushed the initial_setup branch from 9d30a78 to 58c7507 Compare May 2, 2019 17:08

mjgiarlo added 3 commits May 2, 2019 10:14

Configure jest to collect coverage from code in src/ dir

08153f6

Add test coverage for configureAWS module

7c3cd49

Add test coverage for CopyUsers class

6ce1065

Add test coverage for ExportUsers class

945e45a

mjgiarlo marked this pull request as ready for review May 2, 2019 21:52