You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed the parser queue for the 2020 historical reingest slowing down, and parser exits (show by dots on the "app max run time" granfana graph). docker ps -a showed exited parser containers, and all of them had the same pattern, the last url parsed was the same, and when the parser tried to forward the message on, it crashed due to rabbitmq having closed the connection due to processing taking over 30 minutes.
I scaled the hist-fetcher service to zero, and then the parser service to zero, and extracted two stories from the parser-in queue using ./run-qutil.sh dump_archives parser-in run by docker exec'ing into an importer container, and I moved the resulting archive to the data directory to preserve it.
I'm able to reproduce the hang with the attached warc file by sourcing my development venv and running:
./bin/run-parser.sh --test-file-prefix test-warcs/parser-hang-2024-11-19 --rabbitmq-url x
The text was updated successfully, but these errors were encountered:
BUT this is not a thread-safe solution (there is one ITIMER_REAL timer per-process).
The libc timer_create call MIGHT be able to create multiple CLOCK_REALTIME timers per-process, but is not available in Python, AND the signal would need to be delivered to the thread that issued the call, and from what I can tell, all signals are processed in the main thread in Python. The only multi-thread App in story-indexer is tqfetcher.py; rss-fetcher handles this by doing all fetches in a subprocess with its own SIGALARM.
I noticed the parser queue for the 2020 historical reingest slowing down, and parser exits (show by dots on the "app max run time" granfana graph).
docker ps -a
showed exited parser containers, and all of them had the same pattern, the last url parsed was the same, and when the parser tried to forward the message on, it crashed due to rabbitmq having closed the connection due to processing taking over 30 minutes.I scaled the hist-fetcher service to zero, and then the parser service to zero, and extracted two stories from the parser-in queue using
./run-qutil.sh dump_archives parser-in
run by docker exec'ing into an importer container, and I moved the resulting archive to the data directory to preserve it.I'm able to reproduce the hang with the attached warc file by sourcing my development venv and running:
The text was updated successfully, but these errors were encountered: