-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pxc 4469 sst clone #1961
base: 8.0
Are you sure you want to change the base?
Pxc 4469 sst clone #1961
Conversation
This version has a patch to go straight to ist so is not fully functional unless removing that bypass
SSL is NOT supported
bug. Wrong comma
fixing config missed wsrep allowed methods
fixing config missed wsrep allowed methods
fixing config missed wsrep allowed methods
…tcat is not running
I have a question... |
Need to ad extension or make script will not identify them correctly
…the user Some edit/cleanup as suggested by Kamil
As we spoke today, we need to test it with encrypted tables and keyring component/plugin enabled |
- removed the need to perform the restart to clean up the final instance. - use wsrep-sst-receive-address to get IP and port definition - use one port for Netcat and MySQL - some code cleanup - Add message suppression when reporting clone Status %. It will be reported only if different from previous value.
If you outline what kind of operation are you expecting to test when talking about data encryption I can see how clone deal with it. Also if think this can be step 2 |
CLONE_EXECUTE_SQL=$(wsrep_mktemp_in_dir "$WSREP_SST_OPT_DATA" --suffix=.sql clone_execute_XXXX) | ||
CLONE_PREPARE_SQL=$(wsrep_mktemp_in_dir "$WSREP_SST_OPT_DATA" --suffix=.sql clone_prepare_XXXX) | ||
|
||
cleanup_donor # Remove potentially existing clone user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we call cleanup_donor
to remove the potentially existing clone user, however this will run a DROP USER IF EXISTS 'clone_sst'@'%'
in TOI and is replicated to other nodes as well. Is this something we want?
2024-11-22T08:58:32.975616Z 0 [Note] [MY-000000] [WSREP-SST] (debug) Cleanup MySQL ADMIN_PSWD: yx9!A-ezgMrQz5Q7HdKLwD7rEqHzJ0Ot
2024-11-22T08:58:32.975760Z 0 [Note] [MY-000000] [WSREP-SST] (debug) Cleanup MySQL MYSQL_ACLIENT: /home/venki/work/pxc/merge/80/bld/bin/mysql --no-defaults -umysql.pxc.sst.user -S/home/venki/work/pxc/merge/80/bld/mysql-test/var/tmp/mysqld.1.sock --batch --skip_column_names --silent
2024-11-22T08:58:32.986956Z 13 [Note] [MY-000000] [WSREP] wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 1, thd->variables.wsrep_on= 1
2024-11-22T08:58:32.987820Z 0 [Note] [MY-000000] [WSREP-SST] ERROR 1193 (HY000) at line 1: Unknown system variable 'clone_ssl_cert'
2024-11-22T08:58:32.997468Z 14 [Note] [MY-000000] [WSREP] wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 1, thd->variables.wsrep_on= 1
2024-11-22T08:58:33.004851Z 14 [Note] [MY-000000] [WSREP] Executing Query (DROP USER IF EXISTS 'clone_sst'@'%') with write-set (-1) and exec_mode: local in TO Isolation mode
2024-11-22T08:58:33.004913Z 14 [Note] [MY-000000] [WSREP] wsrep: initiating TOI for write set (-1)
2024-11-22T08:58:33.005813Z 14 [Note] [MY-000000] [WSREP] Query (DROP USER IF EXISTS 'clone_sst'@'%') with write-set (3) and exec_mode: toi replicated in TO Isolation mode
2024-11-22T08:58:33.005835Z 14 [Note] [MY-000000] [WSREP] wsrep: TO isolation initiated for write set (3)
2024-11-22T08:58:33.011278Z 14 [Note] [MY-000000] [WSREP] TO END: 3: DROP USER IF EXISTS 'clone_sst'@'%'
2024-11-22T08:58:33.011306Z 14 [Note] [MY-000000] [WSREP] wsrep: completed TOI write set (3)
2024-11-22T08:58:33.011323Z 14 [Note] [MY-000000] [WSREP] Setting WSREPXid (InnoDB): f0d030d8-a8af-11ef-927b-272a1375415c:3
2024-11-22T08:58:33.011372Z 14 [Note] [MY-000000] [WSREP] Updating WSREPXid: f0d030d8-a8af-11ef-927b-272a1375415c:3
2024-11-22T08:58:33.013101Z 14 [Note] [MY-000000] [WSREP] TO END: -1
2024-11-22T08:58:33.013812Z 14 [Warning] [MY-000000] [WSREP] Toggling wsrep_on to OFF will affect sql_log_bin. Check manual for more details
2024-11-22T08:58:33.019913Z 0 [Note] [MY-000000] [WSREP-SST] Cleanup DONOR DONE.
2024-11-22T08:58:33.019989Z 0 [Note] [MY-000000] [WSREP-SST] (debug) -> PREPARE DONOR
2024-11-22T08:58:33.033988Z 15 [Note] [MY-000000] [WSREP] wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 1, thd->variables.wsrep_on= 1
2024-11-22T08:58:33.034750Z 15 [Note] [MY-000000] [WSREP] wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 1, thd->variables.wsrep_on= 1
2024-11-22T08:58:33.040082Z 0 [Note] [MY-000000] [WSREP-SST] Installing CLONE plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
This is probably better to run this on the donor only!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ON the other hand user is created and replicated, so it makes also sense to delete from other nodes. What it should not replicated (and is not with wsrep_on=OFF) is the plugin installation.
Need to think more about this. For the moment I keep it as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep this open as reminder.
WSREP_SST_OPT_REMOTE_JOINER_PSWD="" | ||
|
||
echo "continue" # donor can resume updating data | ||
WSREP_SST_OPT_REMOTE_AUTH=$(echo $WSREP_SST_OPT_ADDR_LOCAL | cut -d '@' -f 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now --address
parameter is in format clone_user:clone_passwd@joiner_ip:joiner:port
Why does it not conform to the format expected by --address
in sst_common (ip:port/path
)?
We are doing the full parsing here instead of taking the advantage of
what we already have in wsrep_sst_common.sh
If it was like joiner_ip:joiner_port/clone_user:clone_passwd
wsrep_sst_common, parses this into tokens
WSREP_SST_OPT_HOST -> joiner_ip
WSREP_SST_OPT_PORT -> joiner_port
WSREP_SST_OPT_PATH -> clone_user:clone_passwd
Then we just need to split WSREP_SST_OPT_PATH here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Tusamarco , As I explained in this and other related comments, there is no reason for changing the SST request string protocol. Using the existing protocol will allow us to reuse existing parsing code and will keep the solution consistent with pxb-sst. Please change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kamil-holubicki I don't know from where you get that ip:port/path
is the current standard for galera.
The docs here https://galeracluster.com/library/documentation/scriptable-sst.htm indicate to report ready <address>:port\n
The standard in sharing this kind of URI say <credential>@<address>:<port>/<attrinbutes>
as already reported in another comment.
So I am following the standard and best practice.
Actually if we have something different in the code we should correct that and do the right thing not deviating from common standard creating a different approach.
That part will not change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as we discussed on Slack, I think we have a bigger issue here than format of the string.
user:password is passed via command line parameter to clone-sst script, like --address user:pass@host:port
. This causes that the password is visible in the log output and in the process list output (ps). So this is a security issue.
In pxb-sst the donor needs to be informed about the user:password used by PXB to do a backup. And we send this information to the pxb-sst script via pipe.
If we look at what upstream does, they indeed use ready user:pass@host:port
. I don't like it, don't see a single reason for doing this, but OK, let it be. But then they extract user:pass
part on the server level and pass this information to the clone-sst script via pipe. Command line contains only host:port
part (like --address host:port
) and then the clone script uses the standard (the same as for pxb) way of parsing --address
value).
Again, I will not insist on SST string format if we pass credentials in a secure way. The reason is that there needs to be some part of the logic on the server side anyway to parse sst string and detect that it includes credentials.
scripts/wsrep_sst_clone.sh
Outdated
cat $CLONE_PREPARE_SQL >> /dev/stderr | ||
exit $RC | ||
fi | ||
#Before waiting for the Joiner Clone mysql we send out the message this is a SST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple occurences.
Comments format is inconsistent. Please add a space after leading hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue spans across the whole file. This particular occurrence was fixed, but others were not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are looking the wrong file version. all comments were updated.
|
||
# Report clone credentials/address to the caller | ||
wsrep_log_debug "-> ready passing string |$CLONE_USER:$CLONE_PSWD]$JOINER_CLONE_HOST:$JOINER_CLONE_PORT|" | ||
echo "ready $CLONE_USER:$CLONE_PSWD]$JOINER_CLONE_HOST:$JOINER_CLONE_PORT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just pass it as $JOINER_CLONE_HOST:$JOINER_CLONE_PORT/$CLONE_USER:$CLONE_PSWD
This way it will be a lot easier to parse it on donor side (already in sst_common). Please see other related comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other comments on same topic.
|
||
wsrep_log_debug "-> WAIT DIR DONOR MESSAGE DONE" | ||
|
||
if [[ ! `cat $WSREP_SST_OPT_DATA/XST_FILE.txt` =~ "SST@" ]];then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still didn't solve the problem of not full message in file
My proposal is to send from Donor such a string:
SST@${WSREP_SST_OPT_GTID}<EOF>
and then wait in a loop above until we receive <EOF>
substring in a file
then we can test the content of the file.
BTW. In the case of IST, shouldn't Donor signal by sending something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have 2 messages.
SST or IST, there is NO other. So if not SST is IST, or as you fear a broken one. No need to add code to check the obvious (IST)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kamil-holubicki, the check is already there because the reg expression checks for the message SST
terminated by @
that's it. We do not need another terminator like <EOF>
. The only relevant part of the message is SST
after that the GTID:POS received is a plus to resolve the position in case the recovery position is not able to recover it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is synchronization issue here.
We are waiting in while [ ! -s $WSREP_SST_OPT_DATA/XST_FILE.txt ];do
loop. Once the file is not empty, we exit the loop. Note that the content of the file may be SS
(not all data written to the file)
Then we go to if [[ !
cat $WSREP_SST_OPT_DATA/XST_FILE.txt =~ "SST@" ]];then
. This if
will detect that no full SST is needed, as the file contains only SS
About IST only. I don't see any place where donor sends the info about IST. It sends info only if full SST is needed.
The point is, that we have to wait in the loop until we receive the whole message, then we can parse this message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever message is not SST@
script should try IST and exit, that is the correct behaviour.
If you can build a test case where only one "s" is sent (from the original action in the DONOR section) I will consider to implement a more sophisticated check on the string.
At the moment in all test done, also using qdisk to generate network interference, never report that as an issue.
About IST as you know you have the bypass
instructions from galera on donor side and the joiner exit from SST script on bypass
.
So not really need to do anything else.
wsrep_log_debug "-> WAIT DIR DONOR MESSAGE DONE" | ||
|
||
if [[ ! `cat $WSREP_SST_OPT_DATA/XST_FILE.txt` =~ "SST@" ]];then | ||
wsrep_log_info "DONOR SAY IST" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything being sent by Donor in case of IST. We should send something like
IST<EOF>
. This will allow the Joiner to immediately leave SST script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever is not SST you need to exit. That's it.
no need to check for other cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where the Donor sends the information that it is not SST? Without sending it, the joiner will wait in the while loop above, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Donor send a message on bypass
in the same way it does for SST using netcat. So joiner will get the message exit the loop and exit the SST process given is not an SST message.
@@ -1540,7 +1542,7 @@ static bool is_sst_request_valid(const std::string &msg) { | |||
Instead of this we will just allow alpha-num + a few special characters | |||
(colon, slash, dot, underscore, square brackets, hyphen). */ | |||
std::string data = msg.substr(method_len + 1, data_len); | |||
static const std::regex allowed_chars_regex("[\\w:/.[\\]-]+"); | |||
static const std::regex allowed_chars_regex("[\\w:/.[\\]@-]+"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this change if we pass the address in the form ip:port/path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that having the path as you mention is not ok at least in standard cases.
If you check the Glaera documentation they say:
To signal that the node is ready to receive the state transfer, print the following string to standard output:
ready <address>:port\n. Use the IP address and port at which the node is waiting for the state snapshot. For example:
ready 192.168.1.1:4444
https://galeracluster.com/library/documentation/scriptable-sst.html
So using the standard uri for db connection host:port\termination which in this case evolves to user:pw@host:port\n The use of uri/path was probably added for XB and other attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said in another comment, I inisist on using existing protocol, there is no reason for changing it.
The standard URI for the db connection is good thing, however, does not apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said above, no.
standard is clear also in codership documentation they follow the <address>:<ip>\n
If in any part of the code we are deviationg from what is the real standard we should correct that.
Few minor refactoring to accomodate comments on PR
Adding preview POC for the Clone method for SST.
The script is based on the one from codership but then need to refactor almost in full to have it working with PXC.
birdeye flow