Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Add scheduler worker to create reoccuring processors #427

Draft
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

dale-wahl
Copy link
Member

The main worker is backend/workers/scheduler.py which mostly imitates the webtool/views/api_tool.py queue_dataset() endpoint. It creates a new dataset, queues a job for it, and then links them. It also adds a new database table which can be used to link multiple datasets to each other (since otherwise they are only linked by a job which is deleted when finished). I added some framework to allow this scheduler to update queries.

Currently the manager runs this scheduler processor at the desired interval. This seems fine to me, but I also toyed with the idea of one scheduler job running and managing all scheduled jobs itself separately. This might be preferable depending on how interactive we want to be.

I have not developed the frontend yet. I was thinking relatively simple: List the main scheduler job with a button to display each created dataset. Buttons to cancel the scheduler and to update the interval. Possibly update the query if desired.

There is one thing with queries that I am not sure yet how to address. A user may want to rerun the exact query over and over to find change (keep querying the same date range and see if records are edited/deleted/etc.), but a user may reasonably want to create a rolling query (runs same search terms each week for the previous week). I need to sort out how we might do that and allow the user to choose. Right now the same exact query is run (for many processors you can leave out the date and accomplish nearly the same thing as the second case above).

To-do
- fix up and add options to scheduler view (e.g. delete/change)
- add scheduler view to navigator
- tie jobs to datasets? (either in scheduler view or, perhaps, filter dataset view)
- more testing...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant