Autoscale background threads for tracer auto batching #382

nfcampos · 2024-01-25T20:59:37Z

No description provided.

- controlled by LANGCHAIN_TRACING_SAMPLING_RATE - if a POST was allowed then the matching PATCH is allowed too - applied on both single and batch tracing endpoints

…ible

Co-authored-by: William FH <[email protected]>

- controlled by LANGCHAIN_TRACING_SAMPLING_RATE - if a POST was allowed then the matching PATCH is allowed too - applied on both single and batch tracing endpoints

- if on, starts a background thread that batches inserts/updates - only applies to insert/updates w/ trace_id and dotted_order

…e possible

jakerachleff · 2024-01-25T21:05:54Z

python/langsmith/client.py

 ) -> List[TracingQueueItem]:
    next_batch: List[TracingQueueItem] = []
    try:
-        while item := tracing_queue.get(block=True, timeout=0.25):
+        while item := tracing_queue.get(block=block, timeout=0.25):


This logic is pretty confusing, summarizing for my own understanding:

We check the queue to see if anything is there. if not: wait 250 ms more and if nothing is there, return all the accumulated items. if so: move to step to

We pop an item off the queue and add it to the batch

If we hit the limit on batch size, return

Is there a degenerate case where you only get a run added every 200ms that could lead to something like 200ms * 100 = 20s run delays?

How can we make this debuggable?

I can make it so only the first call blocks

There is a reason why I made it like this, if we special case the first item as you suggest, then it ends up creating a number of very small batches whenever requests into the queue are ramping up

Pushed a commit which is a compromise

FYI after some testing this compromise loses a lot of the effectivness of the batching behavior, ie ends up creating too many small batches, I think a delay is needed

makes sense to me

jakerachleff · 2024-01-25T21:10:40Z

python/langsmith/client.py

        # 1 for this func, 1 for getrefcount, 1 for _get_data_type_cached
    ):
-        if next_batch := _tracing_thread_drain_queue(tracing_queue, 100):
+        print("im looping", tracing_queue.qsize())


jakerachleff · 2024-01-25T21:18:13Z

python/langsmith/client.py

+        for thread in sub_threads:
+            if not thread.is_alive():
+                sub_threads.remove(thread)
+        if tracing_queue.qsize() > 1000:


nit: let's put this in a variable...

I'd rather not put in a var something I really don't want to ever use again

sorry - in a constant that has a fixed name like _QUEUE_SIZE_TO_SPAWN_NEW_THREAD or something silly like that

jakerachleff · 2024-01-25T21:20:58Z

python/langsmith/client.py

    client = client_ref()
    if client is None:
        return
    tracing_queue = client.tracing_queue
    assert tracing_queue is not None

+    sub_threads: List[threading.Thread] = []


Can we have a max size here?

jakerachleff · 2024-01-25T21:22:54Z

python/langsmith/client.py

+            )
+            sub_threads.append(new_thread)
+            new_thread.start()
+        if next_batch := _tracing_thread_drain_queue(tracing_queue):


How does this get assigned to the right thread?

not sure I understand the comment, this function runs all in the same thread, so any thing in this function is happening in this one thread

ah! So the main thread also drains the queue and handles requests alongside the subthreads, which it spawns to do the same thing?

It's weird we have a main thread and a subthread doing the same action, but in different places, but I guess that works

yea because I think most uses of this don't need the other threads, can change if we think its better

so theoretically could you round robin dequeue records from all the different threads as they're accumulated? Nothing wrong with that, just thinking through how this works

there is a single queue shared by all consumer threads. increasing nr of threads just increases the rate at which we drawdown the single queue

hinthornw

I think this makes sense, but I'm not 100% confident that the queue timeout will behave as desired here

And thenthere's a number of magic numbers. You do a good job of commenting for most of them bt still could be nice to make variables

hinthornw · 2024-01-25T21:35:27Z

python/langsmith/client.py

 ) -> List[TracingQueueItem]:
    next_batch: List[TracingQueueItem] = []
    try:
-        while item := tracing_queue.get(block=True, timeout=0.25):
+        while item := tracing_queue.get(block=block, timeout=0.25):


Seems we are always setting block=False: is there a reason for it being configurable then? And then it looks like timeout is ignored in this situation? (I'm not 100% sure here) https://docs.python.org/3/library/queue.html#queue.Queue.get

block defaults to True both in python lib and our code

Ah i missed the other calls - yupp looks good

- if on, starts a background thread that batches inserts/updates - only applies to insert/updates w/ trace_id and dotted_order - after release, bump sdk version here langchain-ai/langchain#16305 --------- Co-authored-by: William Fu-Hinthorn <[email protected]>

…uto-scale

hinthornw and others added 27 commits January 8, 2024 12:47

Update

c3f82b8

update

7e96bf2

Update test

f9c17ad

Update test

0ccc957

Add optional tracing sampling rate to langsmith sdk

fae1b65

- controlled by LANGCHAIN_TRACING_SAMPLING_RATE - if a POST was allowed then the matching PATCH is allowed too - applied on both single and batch tracing endpoints

Lint

82ba901

In batch tracing endpoint, combine patch and post payloads where poss…

19d608d

…ible

Merge branch 'main' into nc/batch-trace-combine

93489b3

Update python/tests/integration_tests/test_runs.py

099162a

Co-authored-by: William FH <[email protected]>

Update python/langsmith/client.py

511bcac

Co-authored-by: William FH <[email protected]>

Add optional tracing sampling rate to langsmith sdk (#370)

787a9a7

- controlled by LANGCHAIN_TRACING_SAMPLING_RATE - if a POST was allowed then the matching PATCH is allowed too - applied on both single and batch tracing endpoints

Nc/batch trace combine (#371)

763b7d8

Lint

fc34146

Add auto_batch_tracing modality for Client

1f83125

- if on, starts a background thread that batches inserts/updates - only applies to insert/updates w/ trace_id and dotted_order

Lint

2780421

Fix GC problem

439e0e4

Remove unused signal, add comments

1f98fbe

Add missing parent_run_id

3faa5eb

Adjust config

3bd9e6f

Use a priority queue to group runs from same trace in same batch wher…

64febeb

…e possible

Also chunk at end

4f5c793

Update retry

8658098

Actually retry

a675719

Actually Retry by default (#374)

482c13c

Catch 409's (#375)

5bed311

Autoscale background threads for tracer auto batching

836648c

Remove print

8d00547

jakerachleff reviewed Jan 25, 2024

View reviewed changes

hinthornw reviewed Jan 25, 2024

View reviewed changes

Compromise

ddadf9a

hinthornw approved these changes Jan 25, 2024

View reviewed changes

nfcampos added 3 commits January 25, 2024 13:48

Add limits

08f6735

Add one more constant

4538ac9

Adjust

1645fcd

jakerachleff approved these changes Jan 25, 2024

View reviewed changes

Base automatically changed from nc/19jan/auto-batch-tracing to wfh/batch_run_create January 28, 2024 01:27

hinthornw added 6 commits January 27, 2024 17:28

Turn auto-batch tracing off

7916eec

Merge

70ce886

fixup int tests

8732a81

merge

79f1aca

fixup

0a5adfe

Merge branch 'wfh/batch_run_create' into nc/25jan/tracer-auto-batch-a…

96c1afb

…uto-scale

Base automatically changed from wfh/batch_run_create to main January 28, 2024 02:00

merge

fe02f82

hinthornw force-pushed the nc/25jan/tracer-auto-batch-auto-scale branch from 265dff2 to fe02f82 Compare January 28, 2024 02:02

hinthornw merged commit 377135f into main Jan 28, 2024
4 checks passed

hinthornw deleted the nc/25jan/tracer-auto-batch-auto-scale branch January 28, 2024 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscale background threads for tracer auto batching #382

Autoscale background threads for tracer auto batching #382

nfcampos commented Jan 25, 2024

jakerachleff Jan 25, 2024

jakerachleff Jan 25, 2024

nfcampos Jan 25, 2024

nfcampos Jan 25, 2024

nfcampos Jan 25, 2024

nfcampos Jan 25, 2024

jakerachleff Jan 25, 2024

jakerachleff Jan 25, 2024

jakerachleff Jan 25, 2024

nfcampos Jan 25, 2024

jakerachleff Jan 25, 2024

nfcampos Jan 25, 2024

jakerachleff Jan 25, 2024

nfcampos Jan 25, 2024

jakerachleff Jan 25, 2024

nfcampos Jan 25, 2024

jakerachleff Jan 25, 2024

nfcampos Jan 25, 2024

jakerachleff Jan 25, 2024

nfcampos Jan 25, 2024

hinthornw left a comment •

edited

Loading

hinthornw Jan 25, 2024

nfcampos Jan 25, 2024

hinthornw Jan 25, 2024

Autoscale background threads for tracer auto batching #382

Autoscale background threads for tracer auto batching #382

Conversation

nfcampos commented Jan 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hinthornw left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hinthornw left a comment •

edited

Loading