-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abort on DDL in binlog stream #320
Conversation
Co-authored-by: Pawan Dubey <[email protected]> Co-authored-by: Shiv Nagarajan <[email protected]>
Co-authored-by: Shiv Nagarajan <[email protected]>
Co-authored-by: Shiv Nagarajan <[email protected]>
Co-authored-by: Shiv Nagarajan <[email protected]>
Co-authored-by: Shiv Nagarajan <[email protected]>
Co-authored-by: Shiv Nagarajan <[email protected]>
The ruby tests are failing CI due to timeouts, but I don't think it's because of changes we've introduced. We've moved the integration test from the deprecated Go section to being written in Ruby. This causes the early termination of Ghostferry, which lets the test pass. We've also rewritten things to use a smaller SQL parser (although it is still not a trivial dependency). Shiv and I have looked at the SQL parser landscape and a correct parser is simply going to be a large dependency (vs. doing something with regex which I strongly dislike). In the future, now that we have a parser, I could see this being used to check if a DDL is happening to a table which is in the allow list / deny list, for example. Alternatively, if we don't want to have this dependency at all, then I think action that's taken on QueryEvent could take a callback, and then other implementers who are using Ghostferry as a library could choose to do the parsing in their own code base? |
I'm okay with this dependency for now. I would make the DDL event a callback instead of erroring right here. As you mentioned, we can pass the event based on the filters passed to Ghostferry as well, but we can do that in a follow up PR if you would like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look pretty good. Some minor feedback on the tests and the code. We can merge after they're addressed. if you want, in a follow up, we can turn this into callbacks.
@@ -230,6 +231,22 @@ func (s *BinlogStreamer) Run() { | |||
// the full query that was executed on the master (with annotations) | |||
// that is otherwise not possible to reconstruct | |||
query = ev.Event.(*replication.RowsQueryEvent).Query | |||
case *replication.QueryEvent: | |||
query = ev.Event.(*replication.QueryEvent).Query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a switch case structure with a type, so I think you can just do e.Query
here?
@@ -230,6 +231,22 @@ func (s *BinlogStreamer) Run() { | |||
// the full query that was executed on the master (with annotations) | |||
// that is otherwise not possible to reconstruct | |||
query = ev.Event.(*replication.RowsQueryEvent).Query | |||
case *replication.QueryEvent: | |||
query = ev.Event.(*replication.QueryEvent).Query | |||
if string(query) == "BEGIN" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave a comment on why this is ignored here? It's not super obvious to me.
if string(query) == "BEGIN" { | ||
break | ||
} | ||
statement, err := sqlparser.Parse(string(query)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I see string(query)
twice here. Can we modify the above line to simply go: query := string(e.Query)
?
ghostferry = new_ghostferry(MINIMAL_GHOSTFERRY) | ||
|
||
ghostferry.on_status(GhostferryHelper::Ghostferry::Status::BINLOG_STREAMING_STARTED) do | ||
datawriter.start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datawriter
concept is mostly for when you want a background thread hitting MySQL in the test.
In this case, I think you can just go directly to:
datawriter.start | |
source_db.query(....) | |
source_db.query(....) |
See types_test for more examples of this. This should save you from using another class.
source_count = source[DEFAULT_FULL_TABLE_NAME][:row_count] | ||
target_count = target[DEFAULT_FULL_TABLE_NAME][:row_count] | ||
|
||
refute_equal(source_count, target_count, "target should have fewer rows than source") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ghostferry.stderr
should also give you the log messages, if you want to match the log message here. I just realized that nothing actually uses it so maybe what you have is fine 🤷
related to #318
Co-authored-by: Pawan Dubey [email protected]