Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetcher issues from a power outage #319

Open
philbudne opened this issue Aug 1, 2024 · 0 comments
Open

fetcher issues from a power outage #319

philbudne opened this issue Aug 1, 2024 · 0 comments

Comments

@philbudne
Copy link
Contributor

To avoid queuing excessive work to a queue (and possibly exhausting available disk when running a stack on a system without a large array), message producers check each of the possible (fanout) destination -in queue lengths.

When just such a stack (processing old rss files) came up from a power outage, AND the Internet was not reachable, the entire contents of the fetcher-in queue quickly ended up in the -retry queue (for retry in an hour), and ANOTHER batch was loaded into the (now empty) -in queue. Had the Internet outage lasted a longer time, the queue lengths could have grown to REALLY unreasonable lengths.

If the sum of fetcher-in and fetcher-retry queues was considered in Producer.check_output_queues, this could be avoided.
The downside is that items in the -retry queue won't go back to the -in queue for an hour, so there might be an idle hour before work restarts.

This only REALLY applies when old page content (url's from old CSV or RSS files) is being fetched from the Internet.
For "current day" workloads, an Internet outage means there won't be new work from the rss-fetcher, BUT if, for some reason the fetcher (or the entire indexer stack) is not running, the problem WILL occur.

Another take on this would be to have the fetcher detect that ALL fetch requests (to all domains) are failing, and to slow down request processing. The "to all domains" is the tricky part. When the queue contains JUST requests that are being retried because their server can't be reached, that could look like "the Internet is down". I suppose a test for "Is the Internet reachable" could include trying to fetch some well-known pages, or ping some well known IPs (ie; 8.8.8.8).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant