-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7244/multi db connection improvements #7286
base: main
Are you sure you want to change the base?
7244/multi db connection improvements #7286
Conversation
I think this is probably the more controversial / hazardous part of this PR. It is required for stability that max_shared_pool_size >= max_client_connections, otherwise not every client connection can get a slot. We generally recommend max_client_connections == max_shared_pool_size such that you will not make more outbound connections per worker than you receive in inbound client connections, which means there is no slack in the max_shared_pool_size in most set ups. If we change the semantics of max_shared_pool_size, we might risk a lot of set ups becoming unstable unless they set max_shared_pool_size >= max_client_connections + # reserved maintenance connections
Could this part be a separate PR? It probably doesn't require much discussion and #7254 is merged now. |
I see your point, but it is still a more predictable way to manage a cluster, I think. Currently, DBAs just have to keep in mind that there are some operations that require additional connections, so
I'd have to move those points as well, then:
If it is okay, then no problem 🙂 |
Perhaps, if we were starting from scratch. In practice, due to a large user / customer base, and platforms that need to support a large number of version combinations, we try to avoid breaking backwards compatibility and/or introducing custom upgrade steps whenever possible. An alternative option could be to have 2 separate pool settings, e.g. max_maintenance_pool_size & max_shared_pool_size. The former could default to some small number (e.g. 5), to keep the total number of outbound connections predictable.
Yup |
Sounds reasonable, and shouldn't be too hard to implement... |
06cccd8
to
2e1e780
Compare
@marcoslot, done, now it is a separate pool 🙂 |
int currentConnectionsCount; | ||
if (maintenanceConnection) | ||
{ | ||
currentConnectionsLimit = GetMaxMaintenanceSharedPoolSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcoslot, not sure about this moment: current implementation has common MaxMaintenanceSharedPoolSize
both for remote and local connections, because it is simpler. Is it necessary to introduce separate limit for local maintenance connections?
Another effect of this is that when GetLocalSharedPoolSize() == DISABLE_REMOTE_CONNECTIONS_FOR_LOCAL_QUERIES
, remote connections are disabled both for maintenance and regular connections
@onderkalaci, kind remainder about this PR 🙂 |
Well, that's awkward... |
I'll try to take a look at this soon, thank you for your work on this. |
Could you run |
FYI I pushed a commit to your branch to reindent it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I quite like the idea and implementation. The build is failing though, and thus the tests are not actually run in CI. So I don't know if this completely breaks some tests.
I left a few small comments now. Once the tests are passing I'll take another close look.
|
||
|
||
PG_FUNCTION_INFO_V1(citus_remote_connection_stats); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary diff
/* | ||
* Given that citus.shared_max_pool_size can be defined per database, we | ||
* should keep track of shared connections per database. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this comment indeed seems like the correct thing: Based on this comment throttling was originally implemented per database on purpose, but I don't think that makes much sense. max_connections is at the server level. And also citus.max_shared_pool_size is PGC_SIGHUP, so it cannot be set at the database level.
bool counterIncremented = false; | ||
SharedConnStatsHashKey connKey; | ||
|
||
strlcpy(connKey.hostname, hostname, MAX_NODE_LENGTH); | ||
if (strlen(hostname) > MAX_NODE_LENGTH) | ||
{ | ||
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), | ||
errmsg("hostname exceeds the maximum length of %d", | ||
MAX_NODE_LENGTH))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It moved to PrepareWorkerNodeHashKey
} | ||
else | ||
|
||
if (IsLoggableLevel(DEBUG4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this if check shouldn't be necessary. ereport something like that internally already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was there to prevent building of the expensive errmsg
. Maybe it is better to leave if this way just in case, to avoid some performance regression?
uint32 hash = string_hash(entry->workerNodeKey.hostname, NAMEDATALEN); | ||
hash = hash_combine(hash, hash_uint32(entry->workerNodeKey.port)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would be good to call SharedConnectionHashHash here instead, instead of re-implementing it. Same for the Compare function.
@JelteF Great, thanks! I'll address your comments shortly. I also concerned with the test coverage: as I mentioned, I covered only basic scenario. Maybe, there are some use-cases that should be also covered? Especially the distributed deadlock detection, which I am not sure how to test with the existing tool set |
The build error on PG16 seems to be because you included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This at least needs some changes to make the PG16 build and test pass (I think I pressed approve by accident before). General idea and implementation still seem good.
3417599
to
889d5bd
Compare
@JelteF, fixed pg16 build and addressed the review. The |
I also experienced some flakiness with |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7286 +/- ##
==========================================
- Coverage 89.66% 81.44% -8.22%
==========================================
Files 283 282 -1
Lines 60525 59424 -1101
Branches 7540 7360 -180
==========================================
- Hits 54268 48399 -5869
- Misses 4102 8297 +4195
- Partials 2155 2728 +573 |
bb48581
to
2e5b010
Compare
Tried to fix the
|
I haven't looked at it in more detail yet, but at least the flaky test detection is failing because this test randomly fails: +++ /__w/citus/citus/src/test/regress/results/multi_maintenance_multiple_databases.out.modified 2024-02-02 13:40:41.496394020 +0000
@@ -277,21 +277,21 @@
(SELECT setting::int FROM pg_settings WHERE name = 'port')),
$statement$
SELECT groupid, gid
FROM pg_dist_transaction
WHERE gid LIKE 'citus_0_1234_4_0_%'
OR gid LIKE 'citus_0_should_be_forgotten_%'
$statement$) AS t(groupid integer, gid text)
WHERE datname LIKE 'db%';
pg_dist_transaction_after_recovery_coordinator_test
-----------------------------------------------------
- t
+ f
(1 row)
SELECT count(*) = 0 AS cached_connections_after_recovery_coordinator_test
FROM pg_stat_activity
WHERE state = 'idle'
AND now() - backend_start > '5 seconds'::interval;
cached_connections_after_recovery_coordinator_test
---------------------------------------------------- |
@@ -2021,6 +2033,20 @@ RegisterCitusConfigVariables(void) | |||
GUC_STANDARD, | |||
NULL, NULL, NULL); | |||
|
|||
DefineCustomIntVariable( | |||
"citus.max_databases_per_worker_tracked", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to fix the
check_gucs_are_alphabetically_sorted
and got a bit confused: it seems to be failing oncitus.enable_repartitioned_insert_select
even onmain
, on my machine. But it somehow passes on your CI 🤔ivan@ivan-pc:~/IdeaProjects/citus-fork$ ./ci/check_gucs_are_alphabetically_sorted.sh sort: gucs.out:40: disorder: "citus.enable_repartitioned_insert_select",
Looking at your changes both new gucs that you're introducing are not at the correct place alphabetically (both need to be moved up). I expect that if you fix their location in the list, that the error goes away.
I agree that it's weird that it reports citus.enable_repartitioned_insert_select
though. Sounds like a reporting bug. @onurctirtir do you know more about this? (you changed stuff about this script recently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, figured that out: I haven't had LC_COLLATE=C
, so sort
on my machine treated the underscore character differently.
The easiest fix is to force it for the sort
explicitly in the check_gucs_are_alphabetically_sorted.sh
:
grep -P "^[\t][\t]\"citus\.[a-zA-Z_0-9]+\"" RegisterCitusConfigVariables_func_def.out > gucs.out
LC_COLLATE=C sort -c gucs.out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting, could you make a separate PR to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly 🙂
But I am not sure that the easiest fix would be the best one: may be it's better to force it via ./configure
or something like that, so every dev setup would be as identical to CI as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in test runs we already set collation etc to expected settings. But these are simple shell scripts where we don't have this infra structure. And normally that doesn't matter, but indeed for sorting it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to clarify. If you open a PR where you only add LC_COLLATE=C to the sort command, then I'd be happy to accept that. (if not I'll probably do it myself in the near future, if I don't forget)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll add it tomorrow, don't worry :)
Got distracted trying to reproduce the flakiness :)
@JelteF, is there a way I can download the results of failed tests? Tried to reproduce locally but no use. But my tests time twice as long, so my theory is that it is reproducible on weaker machines |
Yes if you go to the overview and scroll to the bottom you will see all the artifacts produced and also the diff of the failures inline: https://github.com/citusdata/citus/actions/runs/7812230917/attempts/2#summary-21309102210 |
The flaky tests you might be able to reproduce locally by running the test in question multiple times (this runs it 200 times):
|
Yep, that's what I tried :) |
1901be2
to
a48f274
Compare
@JelteF, reminder about this PR, just in case 🙂 |
I'll try to make some time soon to do a full review of this, I think it's pretty close to being mergable. For now, could you look at the check-multi tests. Because they are timing out in CI. |
I still would like to add more test coverage, especially deadlock detection, but not sure how to do this with the existing toolkit. |
I think the python test framework that we have would be a good fit for doing that. |
Not sure when I'll be able to get back to this, so just leave a note here: test are timing out because of low |
The timeout is basically implemented, along with a fix of conditional variables for new pool, which I have missed before, but there are still issues related to #7203 and test coverage that should be improved |
- WIP test_multiple_databases_distributed_deadlock_detection
- Investigate citus_local_tables
@JelteF, I think, the PR is finally ready for review 🎉 |
@JelteF, kind reminder, just in case 🙂 |
This PR introduces the following changes:
citus.max_shared_pool_size
cluster-wide. Introduces newcitus.max_databases_per_worker_tracked
GUC to adapt theshared_connection_stats
to this changecitus.max_maintenance_shared_pool_size
, which is equal to 5 by default.1. Makes transaction recovery and distributed deadlocks detection respectcitus.max_shared_pool_size
1. Addscitus.shared_pool_size_maintenance_quota
GUC to reserve a fraction ofcitus.max_shared_pool_size
for maintenance operations exclusively. So, by default, regular and maintenance operations have isolated fractions of the pool and do not compete with each other1. Adds the possibility to assign one database as dedicated for maintenance via newcitus.maintenance_management_database
GUC. I'll integrate it later with #7254 if it is merged1. Disables connections caching for all maintenance backends, except the one forcitus.maintenance_management_database
1. Makes distributed deadlocks detection be invoked only oncitus.maintenance_management_database
, when it is set.Whencitus.maintenance_management_database
is not set (default), all maintenance daemons perform distributed deadlocks detection, but they never cache connections and respect thecitus.shared_pool_size_maintenance_quota
orcitus.max_shared_pool_size
.Logic, related to
citus.main_db
will be introduced in a separate PR.This is a draft implementation to check if I am moving in the right direction, so there is only the base scenario covered by test.
I also haven't found an example of distributed deadlocks detection test, so I'll appreciate some advice on this, as well as how to better address the @onderkalaci's concern:
cc @marcocitus
closes #7244