Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

[Named min timestamp leases] Metrics #7400

Closed
wants to merge 5 commits into from
Closed

Conversation

ergo14
Copy link
Contributor

@ergo14 ergo14 commented Oct 24, 2024

Metrics

Comment on lines 177 to 168
long wereNotFullyFulfilled = requests.keySet().stream()
.filter(timestampName -> {
int requestedTimestamps = requests.get(timestampName);
long returnedTimestamps = responses.get(timestampName).getFreshTimestamps().getCount();
if (returnedTimestamps >= requestedTimestamps) {
return false;
}
telemetryTimestampShortFallConsumer.accept(requestedTimestamps - returnedTimestamps);
return true;
})
.count();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path will be moved and factored as part of the previous PR reviews, so just review the logic here.

I'll move things later as appropriate

if (returnedTimestamps >= requestedTimestamps) {
return false;
}
telemetryTimestampShortFallConsumer.accept(requestedTimestamps - returnedTimestamps);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only want to report shortfall > 0

An alternative is to report shortfall >= 0, or any possible shortfall (or overflow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically, we care about the count here rather than the histogram. But, it may be needed eventually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a histogram?

Copy link
Contributor

@jkozlowski jkozlowski Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will struggle with interpreting these. If you care about magnitude, just log it. But I'd just log a binary based on wasFullyFulfilled > 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if that turns out to be false and we need more, we'll do that later, but at this point I'm struggling to think why I'd want to see the exact number.

Copy link
Contributor Author

@ergo14 ergo14 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a histogram?

More data? Most appropriate would be a meter then? I don't usually like counters because of how they are reported but can use that as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, just a counter

@@ -30,4 +30,10 @@ namespaces:
type: timer
docs: observed call duration during leader election

timestampLease:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will generate a class named the same as the other metrics in this PR. I think that's fine

import javax.annotation.concurrent.GuardedBy;

final class NamedMinTimestampTrackerImpl implements NamedMinTimestampTracker {
private final String timestampName;
private final AtomicInteger numLocksHeld;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Write path: this is modified in lock and unlock calls which are already synchronized
  • Read path: this will be ran in some telemetry wc thread, and the read seems to be a volatile read only - I can't imagine a less perf-impacting read (other than a final field) if we are to have memory visibility. I guess we could have a variable with no synchronization and hope for the best, the metrics don't need to be 100% sure

import javax.annotation.concurrent.GuardedBy;

final class NamedMinTimestampTrackerImpl implements NamedMinTimestampTracker {
private final String timestampName;
private final AtomicInteger numLocksHeld;
private final AtomicLong approximateMinLocked;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I only update when there are calls for getMinimumTimestamp as calling getFirstKey is O(tree height) in the tree map iirc


@GuardedBy("this")
private final SortedMap<Long, UUID> holdersByTimestamp = new TreeMap<>();

NamedMinTimestampTrackerImpl(String timestampName) {
private NamedMinTimestampTrackerImpl(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also considered adding meters for lock/unlock as generally those give more signal/granularity than a size metric

But, looking at Meter, updating it involves more CASs etc

@@ -46,6 +64,7 @@ public synchronized void lock(long timestamp, UUID requestId) {
SafeArg.of("requestId", requestId),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is: the code path is now different than before as this impacts the immutable timestamp. I can special case.

Of course, with all concurrency comments on this PR, I may be too paranoid here

type: gauge
tags:
- name
approximateSmallestLeased:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name to make sure any operator/dev knows this is best-effort and does not over-index

@ergo14 ergo14 changed the title [Named min timestamp leases] Timestamp partial fill metric [Named min timestamp leases] Metrics Oct 24, 2024
@@ -61,24 +63,38 @@ public final class TimestampLeaseAcquirerImpl implements TimestampLeaseAcquirer
private final NamespacedTimestampLeaseService delegate;
private final Unlocker unlocker;
private final Supplier<UUID> uuidSupplier;
private final Consumer<Long> telemetryTimestampShortFallConsumer;

private final Retryer<RequestAndResponse> retryer = RetryerBuilder.<RequestAndResponse>newBuilder()
.retryIfResult(requestAndResponse -> !wasRequestFullyFulfilled(requestAndResponse))
.retryIfException()
.withStopStrategy(StopStrategies.stopAfterAttempt(3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some backoff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline that we won't be adding this now

});
long wereNotFullyFulfilled = requests.keySet().stream()
.filter(timestampName -> {
int requestedTimestamps = requests.get(timestampName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we validate somewhere that these are non-zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I can add a preconditions before I start the whole retry block?

}

@Override
public synchronized Optional<Long> getMinimumTimestamp() {
if (holdersByTimestamp.isEmpty()) {
return Optional.empty();
}
return Optional.of(holdersByTimestamp.firstKey());
Long minimum = holdersByTimestamp.firstKey();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just report it in the same way we do right now in the immutable timestamp case? Just have it be done by Atlas and in the consumer application. Doing this in timelock breaks the usecase here of being able to write dashboard for a particular consumer (or at least maybe makes it more snowflakey/different than how we do all other metrics?)

}

@Override
public synchronized Optional<Long> getMinimumTimestamp() {
if (holdersByTimestamp.isEmpty()) {
return Optional.empty();
}
return Optional.of(holdersByTimestamp.firstKey());
Long minimum = holdersByTimestamp.firstKey();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in general, I'm less bought in to the idea of this metric right now / care about it the least for the initial milestone, so at least split this out into a separate PR, so we can ship the counter first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will spin out a PR only for the shortfall thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ergo14
Copy link
Contributor Author

ergo14 commented Oct 25, 2024

Setting to draft - need to:

  • Spin to 2 PRs
  • One for the size of the lock collection on TimeLock
  • One where I move the metric for the value of the named timestamp to the client side

@ergo14 ergo14 marked this pull request as draft October 25, 2024 15:41
@ergo14 ergo14 force-pushed the aa-nmtl-wiring branch 2 times, most recently from 68962f6 to 69ff52c Compare October 25, 2024 15:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants