Privateclusterfile #2204

iwalther · 2025-01-31T16:42:22Z

Description

FoundationDB Operator creates cluster files in the temp directory so that fdbcli and the FoundationDB client library can connect to the cluster. The cluster file name is deterministically derived from the cluster UID.

Three different actors write to and read from this file without synchronization:

fdbcli whenever it learns from the cluster that the connection string has changed,
the FDB client library whenever it learns from the cluster that the connection string has changed,
FoundationDB Operator when setting up a new admin client. tryConnectionOptions() routinely writes the outdated seed connection string into the file in case of transient errors, without actively restoring the correct one.

While the cluster controller is single-threaded, the library may be writing in the background. Since introduction of backup and restore controllers, there can be multiple active instances of admin client at the same time. The cluster controller is holding an instance of admin client while the subreconcilers are creating other instances, though I think it is inactive during reconciliation and this last point is not an actual problem.

If a race occurs, e.g. the reader finds the cluster file in a transient state where the file is truncated or temporarily overwritten with the now-invalid seed connection string, or a writer overwrites a legitimate new file), this could go wrong. For example, when changing coordinators, we read back the file after fdbcli exits, assuming it contains the new connection string.

To prevent this class of problems, the proposed change will create a unique cluster file for each instance of admin client and delete it when done. It instantiates a singleton database instance with its private cluster file for each cluster UID and keeps it around. To be extra safe, read the new connection string from the cluster instead of from the cluster file after changing coordinators to get an authoritative result.

Type of change

Bug fix (non-breaking change which fixes an issue)

Discussion

We experienced an incident where the operator and subsequently all clients lost contact with the cluster, and I believe this was due to such a race. We have seen connection strings running out of sync before in a less catastrophic way. Our setup is a bit unusual and maybe that exposes us more than others: Our cluster runs in three_data_hall mode, which means a "cluster"'s coordinators are occasionally changing unexpectedly because we change coordinators on another data hall - the three "clusters" are a single FDB cluster with a shared coordinator set after all. And we are running coordinators on dedicated servers, using a custom class, so the servers actually shut down after losing the coordinator role.

The singleton database may be a bit confusing, because the FDB client library internally keeps a singleton map like this, too. They use the cluster file name as key. We need this because with the unique cluster file names, we'd now create more and more databases in the library's cache that never get cleaned up, leading to a memory leak. Alternative approaches could be to keep a single cluster file for the library, and only use temporary files for fdbcli invocations. We could ask the cluster (via the libarary) for the current connection string each time to write a guaranteed-correct cluster file for fdbcli. These are more involved code changes, please advise.

Testing

Unit tests pass. Still working on my local e2e test setup.

Does this change look reasonable? Then I can try it in our staging systems for a while.

Documentation

n/a

Follow-up

n/a

This lets us not open many connections, each using its own cluster file. If each instance has its private cluster file, they can run out of sync when the cluster changes. If they share a cluster file, there can be races as multiple threads write to the same file.

Instead of reading the updated connection string from fdbcli's cluster file, ask the cluster. This is meant to make it less likely to have races in case the file is unexpectedly being written to.

iwalther · 2025-01-31T20:55:19Z

Thought about it a bit more, here's an idea how to let the connection string flow cleanly.

General idea: the coordinators decide on the cluster file content, the authoritative data is in the coordinated state. The operator can suggest a change by running the "coordinators" command. So the operator should just distribute it. There is already proven code in the FDB client that keeps a cluster file in sync with the coordinated state. Currently, the operator overwrites the cluster file frequently.

Proposal: When we see a cluster UID for the first time, we write its seed connection string to a file, call OpenDatabase() on it immediately and never touch the file again. The FDB client library will keep it up-to-date. To run fdbcli, we make a temporary copy of the cluster file (fdb updates it atomically in the file system) and delete the copy when done. We watch the authoritative connection string and publish it as seed connection string to the clients like today.

I'll think about it a bit more and maybe update the PR next week (or make a second one for discussion), I think this is cleaner.

johscheuer

First of all thanks for taking the time to work on this! We have plans to cut a new operator release and drop the support for older FDB release, that will help us in a way that we are able to make use of the mgmt API and then a lot of the fdbcli interactions will go away.

For some historical background: We initially started with having random cluster files but there were some issues in the previous versions of the go bindings where the tracked clusters where not delete when the DB was closed which then was causing a memory leak. So we have to ensure we are closing the DB properly.

gm42 · 2025-02-03T08:53:11Z

fdbclient/common.go

-	if err != nil {
-		return fdb.Database{}, err
-	}
+		database, err := fdb.OpenDatabase(clusterFile)


You could use OpenWithConnectionString to bypass the other cache on Go SDK side; was briefly discussed with @johscheuer here

iwalther added 3 commits January 31, 2025 14:28

Create a unique cluster file for each AdminClient instance

8f53a99

Get updated connection string from database

cf82c58

Instead of reading the updated connection string from fdbcli's cluster file, ask the cluster. This is meant to make it less likely to have races in case the file is unexpectedly being written to.

iwalther mentioned this pull request Jan 31, 2025

Fix an error handling bug in tryConnectionOptions #2200

Draft

johscheuer reviewed Jan 31, 2025

View reviewed changes

gm42 reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Privateclusterfile #2204

Privateclusterfile #2204

iwalther commented Jan 31, 2025

iwalther commented Jan 31, 2025

johscheuer left a comment

gm42 Feb 3, 2025

Privateclusterfile #2204

Are you sure you want to change the base?

Privateclusterfile #2204

Conversation

iwalther commented Jan 31, 2025

Description

Type of change

Discussion

Testing

Documentation

Follow-up

iwalther commented Jan 31, 2025

johscheuer left a comment

Choose a reason for hiding this comment

gm42 Feb 3, 2025

Choose a reason for hiding this comment