Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow_index.update_time bugfix #288

Merged

Conversation

lbestatlas
Copy link
Contributor

@lbestatlas lbestatlas commented Oct 15, 2024

Pull Request type

  • [ x ] Bugfix
  • Feature
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • WHOSUSING.md
  • Other (please describe):

NOTE: Please remember to run ./gradlew spotlessApply to fix any format violations.

Changes in this PR

Issue # #237

  1. When running the postgres v13 flyway upgrade we have found some workflows without update_time in the json_data column, which causes update_time to be null when the data is back filled in the v13 script. The subsequent application of the NOT NULL constraint fails.
  2. When running old and new versions of conductor concurrently during rollout, workflows executed in the old version have failures indexing workflows as workflow_index.update_time is NULL
  3. The new code does not set the update_time in the case of an update only an insert, so this seems incorrect if the intention was to prevent out of sequence updates to the workflow_index.
  4. The back-fill script on the workflow_index can potentially lock up the table for minutes in environments where there is a large number of workflows indexed.
  5. Also, the back-fill script time format incorrectly removes precision from the milliseconds component.

To fix these issues this PR make the following changes:

  1. Replaces the v13 flyway migration script with V13.1 that adds update_time with a default of 'epoch' if the column doesn't exist, and sets the default of 'epoch' if the column does exist. Adding a default in postgres is inexpensive, as it does not update existing rows. The default means the constraint can be added without running the back-fill and stops the issues when have old and new versions running at the same time.

  2. Fix the PostgresIndexDao.indexWorkflow to also set the update_time during an UPDATE. This will ensure updateTime accurately reflects the most recent write.

  3. Address the issues with the back-fill script logic, to ensure it doesn't try to insert NULL values and correctly formats the timestamp.

  4. Make the back-fill script for update_time optional, so that expensive data migrations do not need to be run at start up.

Alternatives considered

I considered removing the not null constraint. However, this meant more complications dealing with null values in the consuming code. So given a default is inexpensive to add this seemed preferable.

I also considered dropping the back-fill script entirely, as it doesn't appear to impact any logic. However, I decided to add the ability to disable data migrations, for expensive scripts. Flyway community edition does not support https://flywaydb.org/blog/skipExecutingMigrations so it is impossible to defer scripts from being run during the migration if using that edition.

- Remove the v13 migration
- Replace with migration script with one that adds update_time with a default of 'epoch' if the column doesn't exist, and sets the default of 'epoch' if the column does exist
- Fix the PostgresIndexDao.indexWorkflow to also set the update_time during an UPDATE
- Make the backfill script for update_time separate an optional and fix the timestamp to include milliseconds
@lbestatlas lbestatlas changed the title Address items raised in issue 237 workflow_index.update_time bugfix Oct 15, 2024
@lbestatlas lbestatlas marked this pull request as draft October 15, 2024 06:07
@lbestatlas lbestatlas marked this pull request as ready for review October 15, 2024 06:08
@lbestatlas
Copy link
Contributor Author

@v1r3n I have made 2 updates to the PR

  1. to fix the build as I had not applied spotless
  2. a change to the backfill script, as I identified further issues with this when I re-tested the script

@@ -0,0 +1,4 @@
-- Optional back-fill script to populate updateTime historically.
UPDATE workflow_index
SET update_time = to_timestamp(json_data->>'updateTime', 'YYYY-MM-DD"T"HH24:MI:SS.MS')::timestamp AT TIME ZONE '00:00'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further testing found that the previous scripts formatting was not interpreting hours and time zones correctly.
Postgres does not interpret the timezone Z from the json->>'updateTime', it has to be set explicitly. Also, the T has to be explicitly skipped by enclosing it in double quotes.

@v1r3n v1r3n merged commit 6e77cc5 into conductor-oss:main Oct 24, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants