Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2213 Explicitly disable control metrics at the end of the job. #2212

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import za.co.absa.enceladus.utils.config.ConfigReader
import za.co.absa.enceladus.utils.modules.SourcePhase
import za.co.absa.enceladus.utils.types.{Defaults, DefaultsByFormat}
import za.co.absa.enceladus.utils.udf.UDFLibrary
import za.co.absa.atum.AtumImplicits._

object StandardizationAndConformanceJob extends StandardizationAndConformanceExecution {
private val jobName = "Enceladus Standardization&Conformance"
Expand Down Expand Up @@ -60,6 +61,7 @@ object StandardizationAndConformanceJob extends StandardizationAndConformanceExe
// post processing deliberately rereads the output ... same as above
runPostProcessing(SourcePhase.Conformance, preparationResult, cmd)
} finally {
spark.disableControlMeasuresTracking()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this in StandardizationAndConformanceJob.scala. What about StandardizationJob.scala and DynamicConformanceJob.scala?

Maybe add implicit spark context to the finishJob function below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing only Pramen use case that always runs Standardization+Conformance.
When running individually the logic is more complicated. There is already selectively disabling of Atum measurements. I decided not to touch it and go only with absolutely safe proof solution.

But you are welcome to refactor the logic as a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it. But there is no ticket connected with it, no tracking & IMHO, a partial solution. No approval from me, but happy if any other codeowners approve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is not partial. Again, when running Standardization and Conformance separately, it works fine.

The purpose of this PR is not refactoring of the existing code. Just a critical fix to DB2 pipelines.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is philosophically complicated. 🤔
On one hand this project is to be sunset, so putting too much effort for the cleanest effort seems like a wasted time.
On the other hand there are questions:

  1. Pramen is the future, perhaps even for Unify. Will this be part of main Pramen or just a "special edition?
  2. So Pramen in integration actually calls Encelauds' main function? That's wild.
  3. It feels weird that the control measurement is disable only on one case - I understand your reasoning, why bother with others, but still. Well at least a comment would be nice - why it's here
  4. It's also architecturally incorrect, that the Atum is enabled on one level (StandardizationExecution or ConformanceExecution but disable on the main level only (furthermore selectively). But the counter argument is the same as in the previous poiint
  5. Why is the disabling done in the finally section? Is it expected that the job can fail and still recover and do some Atum measurements?
  6. If the answer to the previous one is true, why this switching-off can't be done in the "envelope" of Enceladus call in Pramen?

Copy link
Contributor Author

@yruslan yruslan Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As part of 'pramen-extras' module there is 'Enceladus sink'. It is OSS since Enceladus is also OSS. But it will be sunset. This sink will go away if/when Enceladus is no longer used. Pramen won't be used in Unify, at least not in near future. This PR is just an Ecneladus patch for a transient solution and not part of something long-term.
  2. Yes. Also does it by reflection so it doesn't have compile time dependency on Enceladus.
  3. The reasoning is:
    a) if I modify too much, I'd need to test all cases, and I prefer to avoid it,
    b) this is a support branch, with minimal change to patch only what is needed I thing is okay,
  4. I don't know what to say.
  5. Yes, failing jobs should not fail the full Pramen pipeline.
  6. Becaude 'pramen-extras' does not have compile time dependency on Enceladus, nor Atum. Enceladus builds are spark specific. Pramen builds are not.

yruslan marked this conversation as resolved.
Show resolved Hide resolved
finishJob(cmd)
}
}
Expand Down
Loading