Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor in-memory schedulers to postgresql table #3358

Closed
jpbruinsslot opened this issue Aug 13, 2024 · 1 comment
Closed

Refactor in-memory schedulers to postgresql table #3358

jpbruinsslot opened this issue Aug 13, 2024 · 1 comment
Assignees
Labels
mula Issues related to the scheduler scalability tech-debt

Comments

@jpbruinsslot
Copy link
Contributor

jpbruinsslot commented Aug 13, 2024

Current situation

The running schedulers (boefje, normalizer, and report) can be accessed from the rest API. The detailed information about these schedulers are kept in memory in an application-wide dict.

self.schedulers: dict[
str,
schedulers.Scheduler
| schedulers.BoefjeScheduler
| schedulers.NormalizerScheduler
| schedulers.ReportScheduler,
] = {}

When referencing what available queues there are a /queues endpoint can be called (the task runner does this to gather what queues to pop from)

def list(self) -> Any:
return [models.Queue(**s.queue.dict(include_pq=False)) for s in self.schedulers.copy().values()]

This will iterate over all the available schedulers and construct queue representations. The same goes for the available schedulers:

def list(self) -> Any:
return [models.Scheduler(**s.dict()) for s in self.schedulers.values()]

Currently, we don't have any filtering possibilities for these endpoints, meaning a task runner needs to poll the scheduler for available schedulers to pop from and iterate over them.

Suggested changes

  • consolidate /queues and /schedulers endpoints, they are interchangeable
  • move schedulers configuration and settings to a postgres table
  • implement filtering of available schedulers from the rest API
  • change /pop endpoint to support popping of multiple tasks (batches), and add more filtering options (e.g. pop tasks for multiple organisations)
  • optional leveraging ETag (Entity Tag) or Last-Modified headers of scheduler endpoint
  • NEW create one BoefjeScheduler , NormalizerScheduler and ReportScheduler for all organisations instead individual schedulers for every organisation. One message queue for all scan profile mutations, and raw file creation (Combine all schedulers for all organisations #3838)

New Functionality

  • faster overview and querying of all available scheduler without relying on iterating over the in-memory schedulers
  • rest API filtering options allow for specific retrieval of schedulers (e.g. filtering by created_at to retrieve schedulers that have been created since a specific timestamp)
  • speed up start-up times, for already defined schedulers we can reference the database in order to create running schedulers

Considerations

  • Since the current way of referencing organisations in OpenKAT we're still bound by a start-up to check how what are organisations are available in the katalogus
  • Additionally we're still bound by periodically checking the katalogus for new or removed organisations. This can be optimized by sending a signal (either rest, or aqmp) to the scheduler to create scheduler for a new organisation.
@jpbruinsslot jpbruinsslot added mula Issues related to the scheduler tech-debt labels Aug 13, 2024
@jpbruinsslot jpbruinsslot self-assigned this Aug 13, 2024
@underdarknl underdarknl added this to KAT Oct 3, 2024
@github-project-automation github-project-automation bot moved this to Incoming features / Need assessment in KAT Oct 3, 2024
@underdarknl underdarknl moved this from Incoming features / Need assessment to Approved features / Need refinement in KAT Oct 3, 2024
@jpbruinsslot jpbruinsslot moved this from Approved features / Need refinement to Backlog / To do in KAT Oct 28, 2024
@jpbruinsslot jpbruinsslot moved this from Backlog / To do to In Progress in KAT Oct 31, 2024
@jpbruinsslot
Copy link
Contributor Author

superseded by #3838

@github-project-automation github-project-automation bot moved this from Blocked to Done in KAT Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mula Issues related to the scheduler scalability tech-debt
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants