-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table update tracker anya #2764
Table update tracker anya #2764
Conversation
… was successfully sent to LAPIS - then echo $last_modified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff, amazing this works, just some refactoring comments 😀
@@ -252,16 +256,30 @@ class SubmissionController( | |||
fun getReleasedData( | |||
@PathVariable @Valid organism: Organism, | |||
@RequestParam compression: CompressionFormat?, | |||
@RequestHeader(value = HttpHeaders.IF_MODIFIED_SINCE, required = false) ifModifiedSince: Long?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass a long we are kind of violating HTTP specs which have a specific (annoying) format for this particular header. We might want to choose our own header field if we are not spec compliant to not confuse people.
We probably want extra millisecond precision here just to make sure we don't miss an update.
To make a potential cache race condition less impactful, we could always request all data after time X, say 1hr, then we are at most that much out of date, but this is best handled in client I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true - I see that the expected input is: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since#syntax, so we could check if there is a way to convert a string to a sql.Timestamp object and then compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or yeah we just make our own header field with unix timestamp - that is easiest
backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
Show resolved
Hide resolved
exit $exit_code | ||
fi | ||
|
||
last_modified=$(grep '^last-modified:' "$new_input_header" | awk '{print $2}') | ||
echo "$last_modified" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just use a variable here I think rather than using echo and subshell
Maybe we should turn the bash script into a Python script given the complexity. There are now multiple places where we return things or not. It's hard to wrap head round in bash I think. We can add a rewrite to the todos though in case this here works ok for now. |
transaction { | ||
lastTime = UpdateTrackerTable | ||
.selectAll() // Select all rows | ||
.mapNotNull { it[UpdateTrackerTable.lastTimeUpdatedDbColumn] } // Extract non-null datetime values | ||
.maxOrNull()?.toTimestamp() // Find the maximum value | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we had the agreement that we don't put much logic in controller methods (especially database access). I would prefer if this was in a separate model/service class.
…t/loculus into table-update-tracker-anya
…s-project/loculus into table-update-tracker-anya
* Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation --------- Co-authored-by: Cornelius Roemer <[email protected]>
co-authored with @anna-parker and Claude Sonnet try now it might work? Remove views fixup Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741) Co-authored-by: Anna (Anya) Parker <[email protected]> Still return current newest version Return version from correct place Don't issue delete statement unless there's something to delete fix ktlint Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes Table update tracker anya (#2764) * Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation --------- Co-authored-by: Cornelius Roemer <[email protected]> CR edits to table-update-tracker (#2767) * Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation * Use etag instead of modified-since for spec compliance * ktlin fix * Move logic into model * Fix etag formatting * Fix etag yet again --------- Co-authored-by: Anna (Anya) Parker <[email protected]> refactor(backend): streamline control flow refactor(backend): extract function Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Enable etag for /extract-unprocessed-data Use etag in prepro to skip preprocessing match case refactor Try fix fix logic fix! i'm stupid Force refresh every hour Add response code annotations Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py Co-authored-by: Anna (Anya) Parker <[email protected]> Add etag to dummy preprocessing # Conflicts: # backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt # backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt # backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt # backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt # backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt # backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql # backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt # kubernetes/loculus/silo_import_job.sh
co-authored with @anna-parker and Claude Sonnet try now it might work? Remove views fixup Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741) Co-authored-by: Anna (Anya) Parker <[email protected]> Still return current newest version Return version from correct place Don't issue delete statement unless there's something to delete fix ktlint Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes Table update tracker anya (#2764) * Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation --------- Co-authored-by: Cornelius Roemer <[email protected]> CR edits to table-update-tracker (#2767) * Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation * Use etag instead of modified-since for spec compliance * ktlin fix * Move logic into model * Fix etag formatting * Fix etag yet again --------- Co-authored-by: Anna (Anya) Parker <[email protected]> refactor(backend): streamline control flow refactor(backend): extract function Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Enable etag for /extract-unprocessed-data Use etag in prepro to skip preprocessing match case refactor Try fix fix logic fix! i'm stupid Force refresh every hour Add response code annotations Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py Co-authored-by: Anna (Anya) Parker <[email protected]> Add etag to dummy preprocessing # Conflicts: # backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt # backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt # backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt # backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt # backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt # backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql # backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt # kubernetes/loculus/silo_import_job.sh
co-authored with @anna-parker and Claude Sonnet try now it might work? Remove views fixup Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741) Co-authored-by: Anna (Anya) Parker <[email protected]> Still return current newest version Return version from correct place Don't issue delete statement unless there's something to delete fix ktlint Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes Table update tracker anya (#2764) * Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation --------- Co-authored-by: Cornelius Roemer <[email protected]> CR edits to table-update-tracker (#2767) * Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation * Use etag instead of modified-since for spec compliance * ktlin fix * Move logic into model * Fix etag formatting * Fix etag yet again --------- Co-authored-by: Anna (Anya) Parker <[email protected]> refactor(backend): streamline control flow refactor(backend): extract function Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Enable etag for /extract-unprocessed-data Use etag in prepro to skip preprocessing match case refactor Try fix fix logic fix! i'm stupid Force refresh every hour Add response code annotations Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py Co-authored-by: Anna (Anya) Parker <[email protected]> Add etag to dummy preprocessing # Conflicts: # backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt # backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt # backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt # backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt # backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt # backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql # backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt # kubernetes/loculus/silo_import_job.sh
co-authored with @anna-parker and Claude Sonnet try now it might work? Remove views fixup Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741) Co-authored-by: Anna (Anya) Parker <[email protected]> Still return current newest version Return version from correct place Don't issue delete statement unless there's something to delete fix ktlint Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes Table update tracker anya (#2764) * Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation --------- Co-authored-by: Cornelius Roemer <[email protected]> CR edits to table-update-tracker (#2767) * Add option to get-released-data to check last update time. * make name better * Make or short-circuiting * Only work with table_update_tracker - return lastTime in LAST_MODIFIED header * Remove unused functions * add back missing imports - somehow format removed this??? * Update bash scripts * Attempt to fix bash * 2nd little fix * echo value of last_modified not name * Do not update SILO if loculus responds with 304. * Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified * save last snapshot to file * Update with loculus data anyways if longer than 1hour since last update * fix * Improve silo import scripts etc * add back BACKEND_BASE_URL * Log headers * Regularize * Better logs * Try to fix error * return 0 and never null in header * Fix read non-existent file bug * Better wrapper logging * fix delete after update snapshot * Make grep case insensitive * add back removed exit 0 * Improvement to logging * Add test * Add full process to test * Update documentation * Use etag instead of modified-since for spec compliance * ktlin fix * Move logic into model * Fix etag formatting * Fix etag yet again --------- Co-authored-by: Anna (Anya) Parker <[email protected]> refactor(backend): streamline control flow refactor(backend): extract function Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt Co-authored-by: Fabian Engelniederhammer <[email protected]> Enable etag for /extract-unprocessed-data Use etag in prepro to skip preprocessing match case refactor Try fix fix logic fix! i'm stupid Force refresh every hour Add response code annotations Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py Co-authored-by: Anna (Anya) Parker <[email protected]> Add etag to dummy preprocessing # Conflicts: # backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt # backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt # backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt # backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt # backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt # backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql # backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt # kubernetes/loculus/silo_import_job.sh
related to: #2092
preview URL: https://table-update-tracker-anya.loculus.org/
Summary
last-modified-if
header toget-released-data
, this accepts a Long = unix timestamp. If the database was modified since the value inlast-modified-if
get-released-data
will respond with all released data andlast-modified
header, wherelast-modified
is the time of the last database update. If the value inlast-modified-if
is equal or greater to the last time the database was modifiedget-released-data
will respond with 304: NOT_MODIFIED and nolast-modified
header.silo_import_job.sh
andsilo_import_wrapper.sh
to use thelast-modified-if
header. This significantly reduces the load on the database and backend, times are from after preprocessing has finished (CPU is in mCPU):Screenshot
PR Checklist