Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup logical replication #481

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Setup logical replication #481

wants to merge 12 commits into from

Conversation

sravfeyn
Copy link
Member

Product Description

Utils for setting up logical replication

https://docs.google.com/document/d/1r9uFUQK8cOreaUm6E8nx9nyeKZL8tcNLJwJiWXh8kGI/edit?tab=t.0#heading=h.igbj18ssbv2z

Technical Summary

  • Adds a SECONDARY_DATABASE_URL option to local env
  • Adds migrate_multi and setup_logical_replication commands to setup replication

Safety Assurance

Safety story

I have tested this locally and will test on staging

Automated test coverage

NA

Labels & Review

  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@sravfeyn sravfeyn requested a review from calellowitz January 27, 2025 11:04
@@ -6,6 +6,6 @@ set -o nounset


echo "Django migrate"
python manage.py migrate --noinput
python manage.py migrate_multi --noinput
Copy link
Member Author

@sravfeyn sravfeyn Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simpler to have migrate_multi than conditionally run migrate --database=secondary based on env configuration.

if db == DEFAULT_DB_ALIAS:
return True
elif db == settings.SECONDARY_DB_ALIAS:
# Data migrations using RunPython don't need to be
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we do need all the table schemas (not just the superset tables) even though we enable replication for just REPLICATION_ALLOWED_MODELS. This is because django migrations can't work without all schemas in place.

Another notable thing here is that data migrations targetted using RunPython module won't be propagated to secondary database as data is replicated at database level.

@sravfeyn
Copy link
Member Author

@calellowitz I have tested this locally and added few comments wherever this diverged (only slightly) from the spec.

]


class Command(BaseCommand):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have set this up as a management command instead of a migration as that allows us to run this only on envs where it's needed.

@sravfeyn
Copy link
Member Author

@calellowitz Here are few scenarios I tested locally so far, let me know if you can think of any more

  1. Test basic replication; initial-sync/update/delete from primary propagates to secondary DB
  2. Run a migration with data migration in it, make sure only the schema part of the migration is applied on secondary DB and data update occurs from subsequent replication from main

@sravfeyn
Copy link
Member Author

Bumping this one, I will be able to move forward after an initial review from you @calellowitz

Copy link
Collaborator

@calellowitz calellowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments and suggestions as well as a few questions. It would be great to have tests for this as well

@@ -0,0 +1,85 @@
import sys
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these commands shouldn't be in the opportunity app since they have nothing to do with opportunities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried putting these originally in the utils dir, but that is not an app so it didn't recognize these commands. Any suggestions for other place? Or shall I just create a new app for this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say a new app, since this is not related to any of our existing apps, which I thik is better than utils anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is addressed

if migration_name is not None:
args.append(migration_name)

options["verbosity"] = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overrides the default verbosity of 1 to 0 of the command.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that preferable? and what impact does that have on what is output? Just curious since our current migrate uses the default verbosity, I was wondering why the change here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, this is the default on commcare-hq, so I trusted it makes a sensible default, we can tweak it latter, if it doesn't work well for us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this.

Comment on lines 61 to 64
dbs_to_migrate = [
db_alias for db_alias in settings.DATABASES.keys() if settings.DATABASES[db_alias].get("MIGRATE", True)
]
dbs_to_skip = list(set(settings.DATABASES) - set(dbs_to_migrate))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

if dbs_to_skip:
print("\nThe following databases will be skipped:\n * {}\n".format("\n * ".join(dbs_to_skip)))

jobs = [gevent.spawn(migrate_db, db_alias) for db_alias in dbs_to_migrate]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern about ordering? Do we need the migrations to run on the primary (or secondary) first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will test this out

LearnModule,
CompletedModule,
Payment,
User,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to allow selective replication so we don't replicate over the passwords?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is no native way (using replication) to do this, I can think of few ways that I am testing

  1. Add a delete password column trigger on secondary DB (this should immediately delete the password on primary)
  2. Remove the column read permission for users on secondary DB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for looking. that should be fine for now. we can look more into it later

@@ -33,6 +33,14 @@
DATABASES["default"]["ATOMIC_REQUESTS"] = True
DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"

SECONDARY_DB_ALIAS = "default"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use a default value that represents a broken setup. It should either be None or a valid alias.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in the env?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None makes sense. It doesn't need to be in the env, since it can be inferred based on SECONDARY_DATABASE_URL

@@ -33,6 +33,14 @@
DATABASES["default"]["ATOMIC_REQUESTS"] = True
DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"

SECONDARY_DB_ALIAS = "default"

if env.db("SECONDARY_DATABASE_URL", default=""):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this look up env rather than env.db?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env.db is just a util to read the flat URL style spec for DB, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it transforms the url into a database dict, but there is no reason to do that operation since we are throwing away the result? It isn't particularly expensive but doesn't buy us anything and throws warnings when it hits the default value. We just care if the key exists, which env does without .db

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier to define it in this way, with less num of vars to keep track of (like PASSWORD, DB, USERNAME etc). And the current primary DB is also setup that way (so is inconsistent). Do you have strong feelings against this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand, its exactly the same number of vars, I'm just asking for this line to remove .db. Definitely use .db when we setup the actual dict lower down, just not for the existence check (since that isn't what it is meant for, and is what the plain env call does).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

SECONDARY_DB_ALIAS = "default"

if env.db("SECONDARY_DATABASE_URL", default=""):
SECONDARY_DB_ALIAS = "secondary"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably come from the env and be set in line 36. then it can be used in the conditional in line 38

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised and responded in the above comment


if env.db("SECONDARY_DATABASE_URL", default=""):
SECONDARY_DB_ALIAS = "secondary"
DATABASES[SECONDARY_DB_ALIAS] = env.db("SECONDARY_DATABASE_URL")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this would be easier to find if it were next to the other database line so all the databases defined are right next to each other. You can move the conditional settings above the database dict and do it all in one place.

from django.db import DEFAULT_DB_ALIAS


class SecondaryDatabaseRouter:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name and docstring are pretty confusing because they imply that this is the router for the secondary db, but it is in fact the router for all DBs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it. Though, it is possible to have multiple routers as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@sravfeyn
Copy link
Member Author

sravfeyn commented Jan 29, 2025

@calellowitz Thanks for the comments. I have addressed most of them and updated the PR. At what point do we want to test this on staging?

@calellowitz
Copy link
Collaborator

I think I mixed standalone comments and a review by mistake there, but tried to get to your questions. This looks fine to test on staging now, since my remaining comments won't effect behavior or are about what happens if we make later changes to the db list, which we will need to test after rollout anyway.

However, I don't see how you plan to setup the second database anywhere, what kinds of permissions you plan to give the user, and how. Ideally we would want to review those as well before rolling this out.

@sravfeyn
Copy link
Member Author

sravfeyn commented Feb 3, 2025

Here is what I am thinking re.staging

  • Create a new RDS instance in our AWS CCC account
  • Setup logical replication using this PR (in the public schema)
  • Create a readonly user that has readonly role on public schema and a write permissions on a separate schema called extended
  • Use this user inside superset.

Shall we go ahead with this or shall I put this in google docs for further review/discussion?

@sravfeyn sravfeyn force-pushed the sr/replication branch 2 times, most recently from 5a1c37a to 8d5191c Compare February 6, 2025 10:40
@sravfeyn
Copy link
Member Author

sravfeyn commented Feb 7, 2025

@calellowitz

  1. I have addressed your comments
  2. Tested this on staging (sent you an output of the replication status command).
  3. Added rollout instructions to the google doc under Production rollout

I have tested the replication part, but didn't test the superset database switching part since we don't have staging-superset. But I think this is straighforward and short process

Please review this and let me know if you have any comments on the code here. Or comment on the doc for any rollout related comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants