Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBFS file artefact upload happens each terraform apply even if file(s) have not changed #261

Open
jjgriff93 opened this issue Apr 18, 2023 · 8 comments
Labels
bug Something isn't working Data Transformation Data Transformation Epic

Comments

@jjgriff93
Copy link
Member

Describe the bug
When re-running Terraform with no changes, this happens each time:

    # databricks_dbfs_file.dbfs_artifact_upload["../../transform/pipelines/FlowEHR-Data-Forge/patients/artifacts/patients-0.0.1-py3-none-any.whl"] must be replaced
  -/+ resource "databricks_dbfs_file" "dbfs_artifact_upload" {
        ~ dbfs_path = "dbfs:/pipelines/patients/artifacts/patients-0.0.1-py3-none-any.whl" -> (known after apply)
        ~ file_size = 23249 -> (known after apply)
        ~ id        = "/pipelines/patients/artifacts/patients-0.0.1-py3-none-any.whl" -> (known after apply)
        ~ md5       = "2f69276a61ce82475cc8d49533d4066d" -> "different" # forces replacement
          # (2 unchanged attributes hidden)
      }
  
  Plan: 1 to add, 0 to change, 1 to destroy.

Expectation

There should be no changes if the file(s) have not changed

@jjgriff93 jjgriff93 added bug Something isn't working Data Transformation Data Transformation Epic labels Apr 18, 2023
@tanya-borisova
Copy link
Member

Building of the wheel file several times with the same source code results in files having different md5 hashes which is what Terraform uses to determine whether the file has changed or not. The wheel file gets built each time infrastructure-transform gets deployed, so it results in the wheel file being uploaded each time. As this isn't actually breaking anything, IMO this bug should be closed as working as intended.

@damoodamoo
Copy link
Member

I'd like to keep this open until we have a plan for #197 .

If, for instance, we had to incur downtime on the updating of data code then we need to limit this as much as possible, else it will make deployments very difficult and limit our ability to deploy regularly.

@tanya-borisova
Copy link
Member

As it stands, we are not incurring downtime on updating of the data code, and I don't see us doing this in the future, so I don't think these two issues are related, or going to be related

@damoodamoo
Copy link
Member

damoodamoo commented May 15, 2023

From #197 :

Known mitigation is to terminate the cluster and restart the data pipeline job.

If not this - how will we ensure that new data code is picked up in a running cluster?

@tanya-borisova
Copy link
Member

We are discussing issue #197 now so I will reply there...

@jjgriff93
Copy link
Member Author

Agree it’s not breaking anything and should be treated as low priority, but believe it should remain open so there can at least be some small investigation into whether there’s an easy fix or not - it isn’t by design that it’s changing every time, it’s a by-product of the mechanism we’re using to build and upload artefacts. Terraform should report no changes and take no action if in reality nothing has changed; however following investigation if the fix is not a simple one, happy for it to be closed as wontfix

@jjgriff93
Copy link
Member Author

@tanya-borisova I believe you fixed this?

@tanya-borisova
Copy link
Member

The fix for this is to add this for the building of the wheel:

SOURCE_DATE_EPOCH=315532800 python3 -m build

Because it's added to the Data-Pot (see https://github.com/UCLH-Foundry/FlowEHR-Data-Pot/pull/14/files), I believe this should be considered as fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Data Transformation Data Transformation Epic
Projects
Status: No status
Development

No branches or pull requests

3 participants