Tow/implement new cassandra exception #7353

tillyow · 2024-10-15T15:03:44Z

General

Before this PR:
We were not catching TimedOutExceptions.

After this PR:

==COMMIT_MSG==
New Exception classes:

CassandraTExceptions()
CassandraTimedOutException()

Amended how UnavailableException TSwift Cassandra native Exceptions are caught, and they now get mapped to InsufficientConsistencyException() via CassandraTExceptions().

==COMMIT_MSG==

Priority:
P2
Concerns / possible downsides (what feedback would you like?):
None
Is documentation needed?:
No

Compatibility

Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?:
No
Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?:
No
The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.):

Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?:

Does this PR need a schema migration?

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?:

What was existing testing like? What have you done to improve it?:

If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.:

If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?:

Execution

How would I tell this PR works in production? (Metrics, logs, etc.):

Has the safety of all log arguments been decided correctly?:

Will this change significantly affect our spending on metrics or logs?:

How would I tell that this PR does not work in production? (monitors, etc.):

If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?:

If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):

Scale

Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.:

Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?:

Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?:

Development Process

Where should we start reviewing?:

If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:

Please tag any other people who should be aware of this PR:
@jeremyk-91
@raiju

changelog-app · 2024-10-15T15:03:49Z

Generate changelog in `changelog/@unreleased`

What do the change types mean?

feature: A new feature of the service.
improvement: An incremental improvement in the functionality or operation of the service.
fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
and external consumers of the service's API (e.g. customer-written software or integrations).
deprecation: Advertises the intention to remove service functionality without any change to the
operation of the service itself.
manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
performing database surgery, ...) at the time of upgrade for it to succeed.
migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?

❗The break and manual task changelog types will result in a major release!
🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
✨ All others will result in a minor version release.

Type

Description

New Exception classes:

CassandraTExceptions()
CassandraTimedOutException()

Amended how UnavailableException TSwift Cassandra native Exceptions are caught, and they now get mapped to InsufficientConsistencyException() via CassandraTExceptions().

Check the box to generate changelog(s)

Generate changelog entry

…mplement

…b.com/palantir/atlasdb into tow/implement-new-cassandra-exception

tillyow · 2024-10-23T15:50:12Z

I have changed the integration test assert for OneNodeDownDeleteTest .deleteAllTimestampsThrows() and deletingThrows() as our thrown exception has changed, however I have done a search in our clients and we have no clients catching this Exception type, so this is a safe change.

…b.com/palantir/atlasdb into tow/implement-new-cassandra-exception

jeremyk-91

@mdaudali probably has more context on this - but is there a reason we wanted to wrap the rest of the exceptions (other than the TimedOut) in RuntimeExceptions / switch to the unchecked paradigm?

jeremyk-91 · 2024-10-24T14:56:23Z

.palantir/revapi.yml

+      new: "parameter void com.palantir.atlasdb.keyvalue.api.InsufficientConsistencyException::<init>(java.lang.Throwable,\
+        \ ===com.palantir.logsafe.Arg<?>[]===)"
+      justification: "Not a break as I have handle the implementation in the same\
+        \ PR"


note: This would be true for truly internal APIs. However, this exception is actually used in the internal backup and restore product's tests. It's probably not a major issue, but we will want to make sure that we fix their tests and bump their Atlas version accordingly.

I did a sourcegraph check, and I couldn't find anything. How would I see the usages of "internal backup and restore product's tests" if not using sourcegraph?

changelog/@unreleased/pr-7353.v2.yml

...sdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/WrappingQueryRunner.java

...sandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraTimedOutException.java

tillyow · 2024-10-25T10:47:38Z

@mdaudali probably has more context on this - but is there a reason we wanted to wrap the rest of the exceptions (other than the TimedOut) in RuntimeExceptions / switch to the unchecked paradigm?

From what you have said, I think you are asking why we are throwing an AtlasDbDependency exception (which extends the runtime exception) for all TExceptions which are not either InsufficientConsistencyException or TimedOutException? The decision was that as Cassandra thrift exceptions are an in reality a dependency issue it made sense to have either the explicit error eg InsufficientConsistencyException or TimedOutException be thrown or the more general AtlasDbDependency be thrown, but have them logically connected.

The reason for us using an unchecked exception like runtime was I think due to the fact we then wouldn't have to change multiple methods to methodName() throws x exception, and as we had AtlasDbDependency within the code already there seemed to be a precedent for using runtime. However, I suppose there is a design argument to say that Thrift exceptions are operational errors coming from the database due to client usage. I think because the next part of the work would be to adjust alta to catch and proactively react to these errors the code ease overrode the checked versuses unchecked issue?

mdaudali · 2024-10-30T14:07:12Z

@mdaudali probably has more context on this - but is there a reason we wanted to wrap the rest of the exceptions (other than the TimedOut) in RuntimeExceptions / switch to the unchecked paradigm?

To be upfront: I have not reviewed this PR yet - other than giving the guidance on wrapping the TimedOutException and the idea of having the CassandraTExceptions class to do it in.

but is there a reason we wanted to wrap the rest of the exceptions (other than the TimedOut) in RuntimeExceptions / switch to the unchecked paradigm?

Not specifically [haven't reviewed it yet] (but also, IIRC, and will check once better - don't we already wrap a bunch of the TExceptions in RuntimeExceptions via Throwables#somethingsomethingwrap?). For TExceptions in particular I thought it useful to handle them all centrally in the CassandraTExceptions class or whatever it's called, to handle wrapping other exceptions without having except handlers for each individual type (but how we handle the unmapped types I haven't checked.)

Ah, on reviewing the PR - I realise that the PR is changing a bunch of checked exception places too

tillyow · 2024-10-30T15:49:48Z

@mdaudali this PR is just a cleaner version of this one: #7285. Where we discussed the wrapping and copying of Throwables.wrapsomething(). I created a new PR, to seperate out the AtlasDBDEpendency stuff from the CassandraTException stuff.

mdaudali

I started reviewing, but this PR is doing a few too many things (and there's a fair bit to unpack here). Let's break it down to make it easier to review, please.

An example split:

Implement CassandraTimedOutExceptions
Implement CassandraTExceptions, and add tests.
Modify the relevant unchecked exception locations.
(Be careful with the above, you don't always need to use CassandraTExceptions if you're providing a specific message for a given exception - e.g., if you're explicitly catching UnavailableException to provide the relevant exception message, you don't need to call CassandraTExceptions)
Consider whether we need to modify any of the checked exception locations (and ping me what your thoughts are for that)

mdaudali · 2024-11-02T15:27:59Z

...ra-multinode-tests/src/test/java/com/palantir/cassandra/multinode/OneNodeDownDeleteTest.java

@@ -36,13 +36,13 @@ void testSetup(CassandraKeyValueService kvs) {

    @Test
    public void deletingThrows() {
-        assertThrowsInsufficientConsistencyExceptionAndDoesNotChangeCassandraSchema(
+        assertThrowsAtlasDbDependencyExceptionAndDoesNotChangeCassandraSchema(


This seems odd - why do we need to change this? We should still be throwing InsufficientConsistencyExceptions here.

mdaudali · 2024-11-02T15:29:43Z

...-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClientFactory.java

@@ -137,8 +137,14 @@ private CassandraClient instrumentClient(Client rawClient) {
        return client;
    }

-    private Cassandra.Client getRawClientWithKeyspaceSet() throws TException {
-        Client ret = getRawClientWithTimedCreation();
+    private Cassandra.Client getRawClientWithKeyspaceSet() {


I don't think we should be changing the signature here - this should be an invisible refactor, and changing the fact that this no longer throws the checked exception is changing the signature.

Do double check whether you actually want to modify here to remap the exception, or at the caller (and let me know what you believe to be the case)

tillyow added 4 commits October 15, 2024 14:57

Added CassandraTException

bf9b64b

Adding missedTExceptions

fcfcdd4

spotlessApply

73922f5

Added UnavailableException catches

5ca86f7

tillyow added the do not merge label Oct 15, 2024

tillyow requested a review from mdaudali October 15, 2024 15:03

svc-changelog and others added 3 commits October 15, 2024 15:05

Add generated changelog entries

e49373c

Ammended the CassadraException to include args as the other PR will i…

a64466c

…mplement

Merge branch 'tow/implement-new-cassandra-exception' of https://githu…

6907c90

…b.com/palantir/atlasdb into tow/implement-new-cassandra-exception

tillyow changed the base branch from develop to tow/atlasdb-dependency-log-safe October 16, 2024 10:43

tillyow changed the base branch from tow/atlasdb-dependency-log-safe to develop October 16, 2024 10:44

tillyow added 10 commits October 18, 2024 12:35

Amended compileTime

4ece409

explainer for compile time suppression

d0d672f

Merge branch 'develop' into tow/implement-new-cassandra-exception

cb31c35

typo

a7cc461

Api break confirmation not a break as break has been fixed via same PR

d269d60

spoteless

b0ee926

Added a catch for a TEXception

86d55f3

change exception caught

fda40b5

Change to catch AtlasDbDependency excpetion

49fb7e1

Exception now throws AtlasDbException so adapting the integration test

bebe35e

tillyow added 3 commits October 23, 2024 16:50

Merge branch 'develop' into tow/implement-new-cassandra-exception

fd5737b

test changed for new exception thrown

2245e01

Merge branch 'tow/implement-new-cassandra-exception' of https://githu…

97c2bde

…b.com/palantir/atlasdb into tow/implement-new-cassandra-exception

jeremyk-91 reviewed Oct 24, 2024

View reviewed changes

tillyow added 2 commits October 25, 2024 11:56

added requested changes:

87149e6

Change the compaction wording

185f04e

tillyow requested a review from jeremyk-91 October 25, 2024 10:59

spotlessApply

cd6fbfa

mdaudali reviewed Nov 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tow/implement new cassandra exception #7353

Tow/implement new cassandra exception #7353

tillyow commented Oct 15, 2024

changelog-app bot commented Oct 15, 2024 •

edited by tillyow

Loading

tillyow commented Oct 23, 2024 •

edited

Loading

jeremyk-91 left a comment

jeremyk-91 Oct 24, 2024

tillyow Oct 25, 2024

tillyow commented Oct 25, 2024

mdaudali commented Oct 30, 2024 •

edited

Loading

tillyow commented Oct 30, 2024

mdaudali left a comment

mdaudali Nov 2, 2024

mdaudali Nov 2, 2024

Tow/implement new cassandra exception #7353

Are you sure you want to change the base?

Tow/implement new cassandra exception #7353

Conversation

tillyow commented Oct 15, 2024

General

Compatibility

Testing and Correctness

Execution

Scale

Development Process

changelog-app bot commented Oct 15, 2024 • edited by tillyow Loading

Generate changelog in changelog/@unreleased

tillyow commented Oct 23, 2024 • edited Loading

jeremyk-91 left a comment

Choose a reason for hiding this comment

jeremyk-91 Oct 24, 2024

Choose a reason for hiding this comment

tillyow Oct 25, 2024

Choose a reason for hiding this comment

tillyow commented Oct 25, 2024

mdaudali commented Oct 30, 2024 • edited Loading

tillyow commented Oct 30, 2024

mdaudali left a comment

Choose a reason for hiding this comment

mdaudali Nov 2, 2024

Choose a reason for hiding this comment

mdaudali Nov 2, 2024

Choose a reason for hiding this comment

changelog-app bot commented Oct 15, 2024 •

edited by tillyow

Loading

Generate changelog in `changelog/@unreleased`

tillyow commented Oct 23, 2024 •

edited

Loading

mdaudali commented Oct 30, 2024 •

edited

Loading