-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#2213 Explicitly disable control metrics at the end of the job. #2212
#2213 Explicitly disable control metrics at the end of the job. #2212
Conversation
This is needed when the main method can be invoked more than once externally.
…he-end-of-the-job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just read the code
@@ -60,6 +61,7 @@ object StandardizationAndConformanceJob extends StandardizationAndConformanceExe | |||
// post processing deliberately rereads the output ... same as above | |||
runPostProcessing(SourcePhase.Conformance, preparationResult, cmd) | |||
} finally { | |||
spark.disableControlMeasuresTracking() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this in StandardizationAndConformanceJob.scala
. What about StandardizationJob.scala
and DynamicConformanceJob.scala
?
Maybe add implicit spark context to the finishJob
function below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fixing only Pramen use case that always runs Standardization+Conformance.
When running individually the logic is more complicated. There is already selectively disabling of Atum measurements. I decided not to touch it and go only with absolutely safe proof solution.
But you are welcome to refactor the logic as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it. But there is no ticket connected with it, no tracking & IMHO, a partial solution. No approval from me, but happy if any other codeowners approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution is not partial. Again, when running Standardization and Conformance separately, it works fine.
The purpose of this PR is not refactoring of the existing code. Just a critical fix to DB2 pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is philosophically complicated. 🤔
On one hand this project is to be sunset, so putting too much effort for the cleanest effort seems like a wasted time.
On the other hand there are questions:
- Pramen is the future, perhaps even for Unify. Will this be part of main Pramen or just a "special edition?
- So Pramen in integration actually calls Encelauds'
main
function? That's wild. - It feels weird that the control measurement is disable only on one case - I understand your reasoning, why bother with others, but still. Well at least a comment would be nice - why it's here
- It's also architecturally incorrect, that the Atum is enabled on one level (
StandardizationExecution
orConformanceExecution
but disable on themain
level only (furthermore selectively). But the counter argument is the same as in the previous poiint - Why is the disabling done in the
finally
section? Is it expected that the job can fail and still recover and do some Atum measurements? - If the answer to the previous one is true, why this switching-off can't be done in the "envelope" of Enceladus call in Pramen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- As part of 'pramen-extras' module there is 'Enceladus sink'. It is OSS since Enceladus is also OSS. But it will be sunset. This sink will go away if/when Enceladus is no longer used. Pramen won't be used in Unify, at least not in near future. This PR is just an Ecneladus patch for a transient solution and not part of something long-term.
- Yes. Also does it by reflection so it doesn't have compile time dependency on Enceladus.
- The reasoning is:
a) if I modify too much, I'd need to test all cases, and I prefer to avoid it,
b) this is a support branch, with minimal change to patch only what is needed I thing is okay, - I don't know what to say.
- Yes, failing jobs should not fail the full Pramen pipeline.
- Becaude 'pramen-extras' does not have compile time dependency on Enceladus, nor Atum. Enceladus builds are spark specific. Pramen builds are not.
...cala/za/co/absa/enceladus/standardization_conformance/StandardizationAndConformanceJob.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- code reviewed
- pulled
- built
- run
Consider the suggestion though, please
…_conformance/StandardizationAndConformanceJob.scala Co-authored-by: David Benedeki <[email protected]>
Quality Gate passedIssues Measures |
This is needed when the main method can be invoked more than once externally.
The PR fixes running multiple Standardization+Conformance jobs from a single Spark session via Pramen. If this patch is not applied, the second and other pipelines will fail with "control metrics already initialized" exception.
This was tested end to end.