Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table update tracker anya #2764

Merged
merged 33 commits into from
Sep 11, 2024

Conversation

anna-parker
Copy link
Contributor

@anna-parker anna-parker commented Sep 11, 2024

related to: #2092

preview URL: https://table-update-tracker-anya.loculus.org/

Summary

  1. Adds UpdateTrackerTable to database with last time a table in the database was modified.
  2. Adds optional last-modified-if header to get-released-data, this accepts a Long = unix timestamp. If the database was modified since the value in last-modified-if get-released-data will respond with all released data and last-modified header, where last-modified is the time of the last database update. If the value in last-modified-if is equal or greater to the last time the database was modified get-released-data will respond with 304: NOT_MODIFIED and no last-modified header.
  3. Updates silo_import_job.sh and silo_import_wrapper.sh to use the last-modified-if header. This significantly reduces the load on the database and backend, times are from after preprocessing has finished (CPU is in mCPU):
kubectl top pod --sum --containers --all-namespaces | egrep "backend|s-database" | egrep "main|tracker-anya" | sort -hk4                                           bs-mbpas-0092: Wed Sep 11 19:51:30 2024

prev-table-update-tracker-anya   loculus-backend-c859b6f4b-gj5k5                                   backend                              19m          421Mi
prev-table-update-tracker-anya   loculus-database-67f7d88557-9mf6v                                 database                             101m         326Mi
prev-main                        loculus-database-645647c886-kh5vf                                 database                             113m         234Mi
prev-main                        loculus-backend-55f5cdb6cb-p7p4f                                  backend                              300m         483Mi

Screenshot

PR Checklist

@anna-parker anna-parker added the preview Triggers a deployment to argocd label Sep 11, 2024
Copy link
Contributor

@corneliusroemer corneliusroemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, amazing this works, just some refactoring comments 😀

@@ -252,16 +256,30 @@ class SubmissionController(
fun getReleasedData(
@PathVariable @Valid organism: Organism,
@RequestParam compression: CompressionFormat?,
@RequestHeader(value = HttpHeaders.IF_MODIFIED_SINCE, required = false) ifModifiedSince: Long?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pass a long we are kind of violating HTTP specs which have a specific (annoying) format for this particular header. We might want to choose our own header field if we are not spec compliant to not confuse people.

We probably want extra millisecond precision here just to make sure we don't miss an update.

To make a potential cache race condition less impactful, we could always request all data after time X, say 1hr, then we are at most that much out of date, but this is best handled in client I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true - I see that the expected input is: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since#syntax, so we could check if there is a way to convert a string to a sql.Timestamp object and then compare.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or yeah we just make our own header field with unix timestamp - that is easiest

kubernetes/loculus/silo_import_job.sh Outdated Show resolved Hide resolved
kubernetes/loculus/silo_import_wrapper.sh Outdated Show resolved Hide resolved
exit $exit_code
fi

last_modified=$(grep '^last-modified:' "$new_input_header" | awk '{print $2}')
echo "$last_modified"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just use a variable here I think rather than using echo and subshell

@corneliusroemer corneliusroemer changed the base branch from main to table-update-tracker September 11, 2024 08:58
@corneliusroemer
Copy link
Contributor

Maybe we should turn the bash script into a Python script given the complexity. There are now multiple places where we return things or not. It's hard to wrap head round in bash I think.

We can add a rewrite to the todos though in case this here works ok for now.

Comment on lines +269 to +274
transaction {
lastTime = UpdateTrackerTable
.selectAll() // Select all rows
.mapNotNull { it[UpdateTrackerTable.lastTimeUpdatedDbColumn] } // Extract non-null datetime values
.maxOrNull()?.toTimestamp() // Find the maximum value
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we had the agreement that we don't put much logic in controller methods (especially database access). I would prefer if this was in a separate model/service class.

@anna-parker anna-parker marked this pull request as ready for review September 11, 2024 17:55
@corneliusroemer corneliusroemer merged commit 573f7c9 into table-update-tracker Sep 11, 2024
13 checks passed
@corneliusroemer corneliusroemer deleted the table-update-tracker-anya branch September 11, 2024 20:27
corneliusroemer added a commit that referenced this pull request Sep 12, 2024
* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

---------

Co-authored-by: Cornelius Roemer <[email protected]>
corneliusroemer added a commit that referenced this pull request Sep 19, 2024
co-authored with @anna-parker and Claude Sonnet

try

now it might work?

Remove views

fixup

Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741)

Co-authored-by: Anna (Anya) Parker <[email protected]>

Still return current newest version

Return version from correct place

Don't issue delete statement unless there's something to delete

fix ktlint

Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes

Table update tracker anya (#2764)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

---------

Co-authored-by: Cornelius Roemer <[email protected]>

CR edits to table-update-tracker (#2767)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

* Use etag instead of modified-since for spec compliance

* ktlin fix

* Move logic into model

* Fix etag formatting

* Fix etag yet again

---------

Co-authored-by: Anna (Anya) Parker <[email protected]>

refactor(backend): streamline control flow

refactor(backend): extract function

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Enable etag for /extract-unprocessed-data

Use etag in prepro to skip preprocessing

match case refactor

Try fix

fix logic

fix! i'm stupid

Force refresh every hour

Add response code annotations

Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py

Co-authored-by: Anna (Anya) Parker <[email protected]>

Add etag to dummy preprocessing

# Conflicts:
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt
#	backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt
#	backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql
#	backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt
#	kubernetes/loculus/silo_import_job.sh
corneliusroemer added a commit that referenced this pull request Sep 20, 2024
co-authored with @anna-parker and Claude Sonnet

try

now it might work?

Remove views

fixup

Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741)

Co-authored-by: Anna (Anya) Parker <[email protected]>

Still return current newest version

Return version from correct place

Don't issue delete statement unless there's something to delete

fix ktlint

Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes

Table update tracker anya (#2764)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

---------

Co-authored-by: Cornelius Roemer <[email protected]>

CR edits to table-update-tracker (#2767)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

* Use etag instead of modified-since for spec compliance

* ktlin fix

* Move logic into model

* Fix etag formatting

* Fix etag yet again

---------

Co-authored-by: Anna (Anya) Parker <[email protected]>

refactor(backend): streamline control flow

refactor(backend): extract function

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Enable etag for /extract-unprocessed-data

Use etag in prepro to skip preprocessing

match case refactor

Try fix

fix logic

fix! i'm stupid

Force refresh every hour

Add response code annotations

Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py

Co-authored-by: Anna (Anya) Parker <[email protected]>

Add etag to dummy preprocessing

# Conflicts:
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt
#	backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt
#	backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql
#	backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt
#	kubernetes/loculus/silo_import_job.sh
corneliusroemer added a commit that referenced this pull request Sep 23, 2024
co-authored with @anna-parker and Claude Sonnet

try

now it might work?

Remove views

fixup

Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741)

Co-authored-by: Anna (Anya) Parker <[email protected]>

Still return current newest version

Return version from correct place

Don't issue delete statement unless there's something to delete

fix ktlint

Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes

Table update tracker anya (#2764)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

---------

Co-authored-by: Cornelius Roemer <[email protected]>

CR edits to table-update-tracker (#2767)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

* Use etag instead of modified-since for spec compliance

* ktlin fix

* Move logic into model

* Fix etag formatting

* Fix etag yet again

---------

Co-authored-by: Anna (Anya) Parker <[email protected]>

refactor(backend): streamline control flow

refactor(backend): extract function

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Enable etag for /extract-unprocessed-data

Use etag in prepro to skip preprocessing

match case refactor

Try fix

fix logic

fix! i'm stupid

Force refresh every hour

Add response code annotations

Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py

Co-authored-by: Anna (Anya) Parker <[email protected]>

Add etag to dummy preprocessing

# Conflicts:
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt
#	backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt
#	backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql
#	backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt
#	kubernetes/loculus/silo_import_job.sh
corneliusroemer added a commit that referenced this pull request Sep 24, 2024
co-authored with @anna-parker and Claude Sonnet

try

now it might work?

Remove views

fixup

Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741)

Co-authored-by: Anna (Anya) Parker <[email protected]>

Still return current newest version

Return version from correct place

Don't issue delete statement unless there's something to delete

fix ktlint

Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes

Table update tracker anya (#2764)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

---------

Co-authored-by: Cornelius Roemer <[email protected]>

CR edits to table-update-tracker (#2767)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

* Use etag instead of modified-since for spec compliance

* ktlin fix

* Move logic into model

* Fix etag formatting

* Fix etag yet again

---------

Co-authored-by: Anna (Anya) Parker <[email protected]>

refactor(backend): streamline control flow

refactor(backend): extract function

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt

Co-authored-by: Fabian Engelniederhammer <[email protected]>

Enable etag for /extract-unprocessed-data

Use etag in prepro to skip preprocessing

match case refactor

Try fix

fix logic

fix! i'm stupid

Force refresh every hour

Add response code annotations

Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py

Co-authored-by: Anna (Anya) Parker <[email protected]>

Add etag to dummy preprocessing

# Conflicts:
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt
#	backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt
#	backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql
#	backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt
#	kubernetes/loculus/silo_import_job.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants