Job events performance #241

benthomasson · 2022-11-05T19:15:26Z

Improves the performance of processing job events from 200 per second to 4000 per second.

benthomasson · 2022-11-05T19:16:25Z

This simulates 100 concurrent jobs each with 1000 job events.

time ./job_events_cannon.py --websocket-address ws://localhost:8080/api/ws2 --job-id ca73cc33-4ede-465e-bd82-6ccaaa2915d6 -c 1000 --stdout test2 --workers 100 --verbose

Events per second: 4238.371749744774

All job events were successfully recorded:


eda_server=# select count(*) from job_instance_event;
 count  
--------
 100000

Red-HAP · 2022-11-16T16:04:37Z

src/eda_server/api/websocket.py

+                await bulk_job_instance_events.get()
+                for i in range(bulk_job_instance_events.qsize())


Shouldn't this reference the queue attribute?

Suggested change

await bulk_job_instance_events.get()

for i in range(bulk_job_instance_events.qsize())

await bulk_job_instance_events.queue.get()

for i in range(bulk_job_instance_events.queue.qsize())

Yes, I'll take a look at it.

Red-HAP · 2022-11-16T16:05:15Z

src/eda_server/api/websocket.py

+                await bulk_job_instance_hosts.get()
+                for i in range(bulk_job_instance_hosts.qsize())


Shouldn't this also reference the queue attribute?

Suggested change

await bulk_job_instance_hosts.get()

for i in range(bulk_job_instance_hosts.qsize())

await bulk_job_instance_hosts.queue.get()

for i in range(bulk_job_instance_hosts.queue.qsize())

cutwater · 2022-11-18T15:35:22Z

src/eda_server/api/websocket.py

+
+
+bulk_job_instance_events = Batcher(insert_job_instance_events)
+bulk_job_instance_hosts = Batcher(insert_job_instance_hosts)


Global dependency. These functions are used in the scope of websocket handler. It's unclear why they have to be global and completely unnecessary.

cutwater · 2022-11-18T15:41:45Z

src/eda_server/batch.py

+
+    def start(self, get_db_session_factory):
+        self.get_db_session_factory = get_db_session_factory
+        _ = asyncio.get_event_loop()


Why this call is needed?

The get_event_loop function is deprecated for creating a new loop. See https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop

If it was used to create an event loop, the event loop must be created when calling asyncio.run and not here.

If it's used to get an existing running loop, it doesn't make sense (And in general for this you should use asyncio.get_running_loop())

cutwater · 2022-11-18T15:43:06Z

src/eda_server/batch.py

+
+    async def _batcher(self):
+        while True:
+            timeout = time.time() + self.batch_timeout


Use monotonic timer time.monotonic() for calculating time intervals.

cutwater · 2022-11-18T15:46:30Z

src/eda_server/api/websocket.py

+    async with db_session_factory() as db:
+        query = insert(models.job_instance_events).values(
+            [
+                await bulk_job_instance_events.get()


Because you're consuming a fixed and already known number of queue entries, you can use queue.get_nowait() here.

cutwater · 2022-11-18T15:49:58Z

src/eda_server/app.py

+    bulk_job_instance_events.start(
+        app.dependency_overrides[get_db_session_factory]
+    )
+    bulk_job_instance_hosts.start(
+        app.dependency_overrides[get_db_session_factory]
+    )


Suggested change

bulk_job_instance_events.start(

app.dependency_overrides[get_db_session_factory]

)

bulk_job_instance_hosts.start(

app.dependency_overrides[get_db_session_factory]

)

# FIXME: This is a hack. The database initialization should be refactored

# if the database session factory must be accessed outside of the request scope.

bulk_job_instance_events.start(

app.dependency_overrides[get_db_session_factory]()

)

bulk_job_instance_hosts.start(

app.dependency_overrides[get_db_session_factory]()

)

The bulk processor should depend on a factory and not on a wrapper used for FastAPI DI mechanism.
At the moment there is a limitation, that the DatabaseProvider is not accessible outside of the request context.
This should be refactored.

cutwater · 2022-11-18T16:04:57Z

src/eda_server/batch.py

+        _ = asyncio.get_event_loop()
+        asyncio.create_task(self._batcher())
+
+    async def _batcher(self):


The logic here looks not very intuitive and probably not performant as well.
It runs a loop to check the queue size with fixed intervals and then passes the queue to the function that consumes messages from it. The queue size is unlimited which may lead to the back pressure problem.

While this somehow works because queue size is known and the queue access doesn't need synchronization, I'd re-write it in a different manner.

Use limited queue size to avoid uncontrolled growth of a queue.

The processor_fn callback should expect full batch: a sequence (e.g. list) of items to be processed.

The Batcher consumes the queue with a timeout:

# PSEUDOCODE # while True: try: item = await asyncio.wait(queue.get(), timeout - elapsed_time) except TimeoutError: # flush batch.append(item) if len(batch) > max_batch_size: # flush

benthomasson added 2 commits November 5, 2022 14:16

Adds job id cache

0945abe

Adds simple batching for bulk operations

645a808

benthomasson added 5 commits November 5, 2022 19:43

Refactor a bit

c4f7ed3

Adds batch processor

83eb9ac

isort

868e419

flake8

f58f954

fixes

c5b34f5

benthomasson marked this pull request as ready for review November 6, 2022 00:54

benthomasson requested review from cutwater and Red-HAP November 6, 2022 00:54

Tweak batch timeout

b3bcd56

Red-HAP suggested changes Nov 16, 2022

View reviewed changes

cutwater suggested changes Nov 18, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job events performance #241

Job events performance #241

benthomasson commented Nov 5, 2022

benthomasson commented Nov 5, 2022 •

edited

Loading

Red-HAP Nov 16, 2022

benthomasson Nov 16, 2022

Red-HAP Nov 16, 2022

cutwater Nov 18, 2022

cutwater Nov 18, 2022

cutwater Nov 18, 2022

cutwater Nov 18, 2022

cutwater Nov 18, 2022

cutwater Nov 18, 2022

		await bulk_job_instance_events.get()
		for i in range(bulk_job_instance_events.qsize())

		await bulk_job_instance_hosts.get()
		for i in range(bulk_job_instance_hosts.qsize())



		bulk_job_instance_events = Batcher(insert_job_instance_events)
		bulk_job_instance_hosts = Batcher(insert_job_instance_hosts)

Job events performance #241

Are you sure you want to change the base?

Job events performance #241

Conversation

benthomasson commented Nov 5, 2022

benthomasson commented Nov 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benthomasson commented Nov 5, 2022 •

edited

Loading