Setup logical replication #481

sravfeyn · 2025-01-27T11:04:13Z

Product Description

Utils for setting up logical replication

https://docs.google.com/document/d/1r9uFUQK8cOreaUm6E8nx9nyeKZL8tcNLJwJiWXh8kGI/edit?tab=t.0#heading=h.igbj18ssbv2z

Technical Summary

Adds a SECONDARY_DATABASE_URL option to local env
Adds migrate_multi and setup_logical_replication commands to setup replication

Safety Assurance

Safety story

I have tested this locally and will test on staging

Automated test coverage

NA

Labels & Review

The set of people pinged as reviewers is appropriate for the level of risk of the change

sravfeyn · 2025-01-27T11:06:23Z

docker/start_migrate

@@ -6,6 +6,6 @@ set -o nounset


 echo "Django migrate"
-python manage.py migrate --noinput
+python manage.py migrate_multi --noinput


It's simpler to have migrate_multi than conditionally run migrate --database=secondary based on env configuration.

sravfeyn · 2025-01-27T11:09:16Z

config/db_router.py

+        if db == DEFAULT_DB_ALIAS:
+            return True
+        elif db == settings.SECONDARY_DB_ALIAS:
+            # Data migrations using RunPython don't need to be


Note we do need all the table schemas (not just the superset tables) even though we enable replication for just REPLICATION_ALLOWED_MODELS. This is because django migrations can't work without all schemas in place.

Another notable thing here is that data migrations targetted using RunPython module won't be propagated to secondary database as data is replicated at database level.

sravfeyn · 2025-01-27T11:10:11Z

@calellowitz I have tested this locally and added few comments wherever this diverged (only slightly) from the spec.

sravfeyn · 2025-01-27T11:11:16Z

commcare_connect/opportunity/management/commands/setup_logical_replication.py

+]
+
+
+class Command(BaseCommand):


I have set this up as a management command instead of a migration as that allows us to run this only on envs where it's needed.

sravfeyn · 2025-01-27T14:56:23Z

@calellowitz Here are few scenarios I tested locally so far, let me know if you can think of any more

Test basic replication; initial-sync/update/delete from primary propagates to secondary DB
Run a migration with data migration in it, make sure only the schema part of the migration is applied on secondary DB and data update occurs from subsequent replication from main

sravfeyn · 2025-01-28T15:02:29Z

Bumping this one, I will be able to move forward after an initial review from you @calellowitz

calellowitz

Left a few comments and suggestions as well as a few questions. It would be great to have tests for this as well

calellowitz · 2025-01-28T18:36:57Z

commcare_connect/opportunity/management/commands/migrate_multi.py

@@ -0,0 +1,85 @@
+import sys


these commands shouldn't be in the opportunity app since they have nothing to do with opportunities.

Yeah, I tried putting these originally in the utils dir, but that is not an app so it didn't recognize these commands. Any suggestions for other place? Or shall I just create a new app for this?

I would say a new app, since this is not related to any of our existing apps, which I thik is better than utils anyway.

This is addressed

calellowitz · 2025-01-28T18:41:27Z

commcare_connect/opportunity/management/commands/migrate_multi.py

+        if migration_name is not None:
+            args.append(migration_name)
+
+        options["verbosity"] = 0


What does this do?

This overrides the default verbosity of 1 to 0 of the command.

why is that preferable? and what impact does that have on what is output? Just curious since our current migrate uses the default verbosity, I was wondering why the change here

To be honest, this is the default on commcare-hq, so I trusted it makes a sensible default, we can tweak it latter, if it doesn't work well for us.

I removed this.

calellowitz · 2025-01-28T18:42:08Z

commcare_connect/opportunity/management/commands/migrate_multi.py

+        dbs_to_migrate = [
+            db_alias for db_alias in settings.DATABASES.keys() if settings.DATABASES[db_alias].get("MIGRATE", True)
+        ]
+        dbs_to_skip = list(set(settings.DATABASES) - set(dbs_to_migrate))


Is this used at all?

calellowitz · 2025-01-28T18:42:56Z

commcare_connect/opportunity/management/commands/migrate_multi.py

+        if dbs_to_skip:
+            print("\nThe following databases will be skipped:\n * {}\n".format("\n * ".join(dbs_to_skip)))
+
+        jobs = [gevent.spawn(migrate_db, db_alias) for db_alias in dbs_to_migrate]


Is there any concern about ordering? Do we need the migrations to run on the primary (or secondary) first?

Good point, I will test this out

calellowitz · 2025-01-28T18:45:26Z

commcare_connect/opportunity/management/commands/setup_logical_replication.py

+    LearnModule,
+    CompletedModule,
+    Payment,
+    User,


Is there a way to allow selective replication so we don't replicate over the passwords?

It looks like there is no native way (using replication) to do this, I can think of few ways that I am testing

Add a delete password column trigger on secondary DB (this should immediately delete the password on primary)

Remove the column read permission for users on secondary DB

ok, thanks for looking. that should be fine for now. we can look more into it later

calellowitz · 2025-01-28T19:00:48Z

config/settings/base.py

@@ -33,6 +33,14 @@
 DATABASES["default"]["ATOMIC_REQUESTS"] = True
 DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"

+SECONDARY_DB_ALIAS = "default"


I don't think we should use a default value that represents a broken setup. It should either be None or a valid alias.

Should this be in the env?

None makes sense. It doesn't need to be in the env, since it can be inferred based on SECONDARY_DATABASE_URL

calellowitz · 2025-01-28T19:03:27Z

config/settings/base.py

@@ -33,6 +33,14 @@
 DATABASES["default"]["ATOMIC_REQUESTS"] = True
 DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"

+SECONDARY_DB_ALIAS = "default"
+
+if env.db("SECONDARY_DATABASE_URL", default=""):


should this look up env rather than env.db?

env.db is just a util to read the flat URL style spec for DB, right?

Yeah, it transforms the url into a database dict, but there is no reason to do that operation since we are throwing away the result? It isn't particularly expensive but doesn't buy us anything and throws warnings when it hits the default value. We just care if the key exists, which env does without .db

It's easier to define it in this way, with less num of vars to keep track of (like PASSWORD, DB, USERNAME etc). And the current primary DB is also setup that way (so is inconsistent). Do you have strong feelings against this?

I'm not sure I understand, its exactly the same number of vars, I'm just asking for this line to remove .db. Definitely use .db when we setup the actual dict lower down, just not for the existence check (since that isn't what it is meant for, and is what the plain env call does).

calellowitz · 2025-01-28T19:04:23Z

config/settings/base.py

+SECONDARY_DB_ALIAS = "default"
+
+if env.db("SECONDARY_DATABASE_URL", default=""):
+    SECONDARY_DB_ALIAS = "secondary"


this should probably come from the env and be set in line 36. then it can be used in the conditional in line 38

Raised and responded in the above comment

calellowitz · 2025-01-28T19:05:37Z

config/settings/base.py

+
+if env.db("SECONDARY_DATABASE_URL", default=""):
+    SECONDARY_DB_ALIAS = "secondary"
+    DATABASES[SECONDARY_DB_ALIAS] = env.db("SECONDARY_DATABASE_URL")


Nit: this would be easier to find if it were next to the other database line so all the databases defined are right next to each other. You can move the conditional settings above the database dict and do it all in one place.

calellowitz · 2025-01-28T19:06:53Z

config/db_router.py

+from django.db import DEFAULT_DB_ALIAS
+
+
+class SecondaryDatabaseRouter:


This name and docstring are pretty confusing because they imply that this is the router for the secondary db, but it is in fact the router for all DBs.

I will change it. Though, it is possible to have multiple routers as well.

sravfeyn · 2025-01-29T15:33:11Z

@calellowitz Thanks for the comments. I have addressed most of them and updated the PR. At what point do we want to test this on staging?

calellowitz · 2025-01-31T18:52:40Z

I think I mixed standalone comments and a review by mistake there, but tried to get to your questions. This looks fine to test on staging now, since my remaining comments won't effect behavior or are about what happens if we make later changes to the db list, which we will need to test after rollout anyway.

However, I don't see how you plan to setup the second database anywhere, what kinds of permissions you plan to give the user, and how. Ideally we would want to review those as well before rolling this out.

sravfeyn · 2025-02-03T06:35:29Z

Here is what I am thinking re.staging

Create a new RDS instance in our AWS CCC account
Setup logical replication using this PR (in the public schema)
Create a readonly user that has readonly role on public schema and a write permissions on a separate schema called extended
Use this user inside superset.

Shall we go ahead with this or shall I put this in google docs for further review/discussion?

sravfeyn · 2025-02-07T15:04:23Z

@calellowitz

I have addressed your comments
Tested this on staging (sent you an output of the replication status command).
Added rollout instructions to the google doc under Production rollout

I have tested the replication part, but didn't test the superset database switching part since we don't have staging-superset. But I think this is straighforward and short process

Please review this and let me know if you have any comments on the code here. Or comment on the doc for any rollout related comments.

sravfeyn requested a review from calellowitz January 27, 2025 11:04

Setup logical replication

630d48c

sravfeyn force-pushed the sr/replication branch from bb58e8a to 630d48c Compare January 27, 2025 11:05

sravfeyn commented Jan 27, 2025

View reviewed changes

calellowitz reviewed Jan 28, 2025

View reviewed changes

Address review comments

ab7acf3

make a new app for multidb

73401df

sravfeyn force-pushed the sr/replication branch from c1e35c2 to 73401df Compare February 2, 2025 15:41

sravfeyn force-pushed the sr/replication branch 2 times, most recently from 5a1c37a to 8d5191c Compare February 6, 2025 10:40

Be explicit about which RunPython mig can run on secondary DB

b05b166

sravfeyn force-pushed the sr/replication branch from 8d5191c to b05b166 Compare February 6, 2025 11:29

sravfeyn added 8 commits February 6, 2025 17:07

Merge branch 'main' into sr/replication

555c0a8

set run_on_secondary flag

f35fdc8

add few more tables

87fc73d

add few more tables

80fb6af

move common setting to base.py

d1223e5

move common setting to base.py

debe0f3

sslmode off for replication setup;same VPC

3d209a7

Add command to get repl status

f58b3f3

		from django.db import DEFAULT_DB_ALIAS


		class SecondaryDatabaseRouter:

Setup logical replication #481

Are you sure you want to change the base?

Setup logical replication #481

Conversation

sravfeyn commented Jan 27, 2025

Product Description

Technical Summary

Safety Assurance

Safety story

Automated test coverage

Labels & Review

sravfeyn Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sravfeyn commented Jan 27, 2025

Choose a reason for hiding this comment

sravfeyn commented Jan 27, 2025

sravfeyn commented Jan 28, 2025

calellowitz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sravfeyn commented Jan 29, 2025 • edited Loading

calellowitz commented Jan 31, 2025

sravfeyn commented Feb 3, 2025 • edited Loading

sravfeyn commented Feb 7, 2025 • edited Loading

sravfeyn Jan 27, 2025 •

edited

Loading

sravfeyn commented Jan 29, 2025 •

edited

Loading

sravfeyn commented Feb 3, 2025 •

edited

Loading

sravfeyn commented Feb 7, 2025 •

edited

Loading