-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize handling of storage and execution time quotas #1969
Conversation
- Check if both quotas are over before starting a crawl and during the crawl, stop crawl gracefully if over - Migrate existing stopped_quota_reached and skipped_quota_reached crawl states to indicate which quota they relate to - Use states that reflect both the action (stopped, skipped) and which quota was over, checking storage quota first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frontend looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The quota lookup itself could use some optimization, will open a separate PR for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we also need to add a migration for lastCrawlState
on the workflows.
- instead of looking up storage and exec min quotas from oid, and loading an org each time, load org once and then check quotas on the org object - many times the org was already available, and was looked up again - storage and exec quota checks become sync - rename can_run_crawl() to more generic can_write_data(), optionally also checks exec minutes - follow up to #1969
- instead of looking up storage and exec min quotas from oid, and loading an org each time, load org once and then check quotas on the org object - many times the org was already available, and was looked up again - storage and exec quota checks become sync - rename can_run_crawl() to more generic can_write_data(), optionally also checks exec minutes - follow up to #1969
- instead of looking up storage and exec min quotas from oid, and loading an org each time, load org once and then check quotas on the org object - many times the org was already available, and was looked up again - storage and exec quota checks become sync - rename can_run_crawl() to more generic can_write_data(), optionally also checks exec minutes - typing: get_org_by_id() always returns org, or throws, adjust methods accordingly (don't check for none, catch exception) - typing: fix typo in BaseOperator, catch type errors in operator 'org_ops' - operator quota check: use up-to-date 'status.size' for current job, ignore current job in all jobs list to avoid double-counting - follow up to #1969
Fixes #1968
Changes:
stopped_quota_reached
andskipped_quota_reached
migrated to new values that indicate which quota was reachedskipped_storage_quota_reached
orskipped_time_quota_reached
stopped_storage_quota_needed
orstopped_time_quota_reached
state as appropriateTo run the nightly tests, build the local backend and then run:
python -m pytest backend/test_nightly/test_storage_quota.py
python -m pytest backend/test_nightly/test_execution_minutes_quota.py