-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bootstrap): wait core tables are ready before copying #183
Conversation
, config = Opts | ||
}, | ||
%% Create (or copy) the mnesia table and wait for it: | ||
ok = create_table(MetaSpec), | ||
ok = mria_mnesia:copy_table(?schema, Storage), | ||
%% Ensure replicas are available before starting copy: | ||
ok = mria_mnesia:wait_for_tables([?schema]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this function does not return anything else than ok
?
if that's the case, maybe add a comment here to document what it means to the node boot sequence if this wait has to take a very long time (or never returns ok
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can return {error, ...}
when mnesia is stopped while the bootstrap is in progress.
This calls mria_mnesia:wait_for_tables/1
, which (unlike mnesia:wait_for_tables/1
) logs quite extensive diagnostic information every 30 seconds if this wait is taking too long, so this should be enough? A comment detailing why this was needed won't hurt, in addition to commit message, I guess.
In specific circumstances `mria_mnesia:copy_table/2` may fail with `{system_limit, '$mria_rlog_sync', {Node, none_active}}` error, which crashes the node. Consider the following scenario: 1. Node `N1` starts up and bootstraps Mria. 2. Node `N2` starts up and bootstraps Mria. 3. Node `N2` joins cluster consisting of node `N1`. 4. Node `N2` runs `mria_mnesia:join_cluster/1` and starts Mria again. 5. At the exact same time node `N1` decides to restart for some reason. 6. During bootstrap, node `N2` tries to copy `$mria_rlog_sync` table. 7. Mnesia sees there's nowhere to copy from and aborts the operation. 8. Mria fails to start. While unlikely, in practice this might be achieved when the operator performs unusual maintenance operations, e.g. simultaneously requests version upgrade and scales the cluster up.
Silence "expression updates a literal" compiler lint recently introduced in erlang/otp#8069.
13f9607
to
6084346
Compare
In specific circumstances
mria_mnesia:copy_table/2
may fail with{system_limit, '$mria_rlog_sync', {Node, none_active}}
error, which crashes the node.Consider the following scenario:
N1
starts up and bootstraps Mria.N2
starts up and bootstraps Mria.N2
joins cluster consisting of nodeN1
.N2
runsmria_mnesia:join_cluster/1
and starts Mria again.N1
decides to restart for some reason.N2
tries to copy$mria_rlog_sync
table.While unlikely, in practice this might be achieved when the operator performs unusual maintenance operations, e.g. simultaneously requests version upgrade and scales the cluster up.
Fixes EMQX-13309.