Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure Hydration/Transformation/Indexing Times #153

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

tpendragon
Copy link
Contributor

@tpendragon tpendragon commented Oct 21, 2024

Tracks index metrics for Hydrator, Transformer, and Indexer.

This stores the index times and acknowledged records in the database when a fresh process is started, and ends the measurement the first time acknowledgements empty out after polling is hit.

The dashboard has a page to display these metrics along with our lower and upper bound of times.

Closes #115
Screen Shot 2024-11-27 at 11 09 26 AM

Copy link

github-actions bot commented Oct 21, 2024

Container Scanning Status: ✅ Success


ghcr.io/pulibrary/dpul-collections:pr-153 (debian 12.6)
=======================================================
Total: 0 (HIGH: 0, CRITICAL: 0)

@hackartisan
Copy link
Member

This is looking good. Do you think we shouldn't use a span, then?

@hackartisan
Copy link
Member

@tpendragon
Copy link
Contributor Author

Like using https://hexdocs.pm/opentelemetry_api/OpenTelemetry.Tracer.html#start_span/2

I looked at that, but I couldn't figure out how to wrap something around a process that happens over a long period of time. Like I guess I could spawn a task that spins and waits...but I think just sending an event for the duration is just as good.

I'm not sure this is useful still - the full hydration test definitely
takes more than a second to run, but since we're just measuring what it
takes to produce those last messages it's less than a second. Maybe the
differential isn't a big deal in production.
@tpendragon tpendragon changed the title Proof of concept index metrics tracker. Measure Hydration/Transformation/Indexing Times Nov 27, 2024
@tpendragon tpendragon marked this pull request as ready for review November 27, 2024 19:10
Copy link
Member

@hackartisan hackartisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good and cool, the IndexMetricsTracker genserver is a bit hard for me to follow; maybe we could look at it together and then I could make some suggestions around organization and documentation that might help me or someone looking at it in the future?

@@ -0,0 +1,129 @@
defmodule DpulCollections.IndexingPipeline.DashboardPage do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be in DpulCollectionsWeb somewhere, is there a particular reason it's here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it here because it didn't really seem like it was part of the app, but happy to move it.

lib/dpul_collections/index_metrics_tracker.ex Show resolved Hide resolved
def handle_call({:poll_started, source}, _, state) do
if get_in(state, [source.processor_marker_key(), :start_time]) != nil &&
get_in(state, [source.processor_marker_key(), :end_time]) == nil do
state = put_in(state, [source.processor_marker_key(), :request_end], true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what :request_end is, not entirely sure what state should / could look like

:indexer
end

defp handle_ack_received([:database_producer, :ack, :done], _measurements, metadata, _config) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is split from the others?

def mount(_params, _session, socket) do
socket =
assign(socket,
hydration_times: IndexMetricsTracker.index_times(HydrationProducerSource),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could rename the index_times function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about cache_times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I tried index_durations - it's the index part that's confusing?

|> put_in([:acked_count], old_acked_count + new_acked_count)
end

defp handle_ack_received([:database_producer, :ack, :done], _measurements, metadata, _config) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this one recurse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I should probably call that function something different

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Measure time to hydrate, transform, index
2 participants