-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refine UT framework to promote GPU evaluation, Part 1 #10851
refine UT framework to promote GPU evaluation, Part 1 #10851
Conversation
build |
2ffcb81
to
c6ee8d5
Compare
build |
build |
build |
hi @jlowe can you please review the PR? |
@@ -1485,6 +1485,13 @@ val GPU_COREDUMP_PIPE_PATTERN = conf("spark.rapids.gpu.coreDump.pipePattern") | |||
.stringConf | |||
.createWithDefault(false.toString) | |||
|
|||
val FOLDABLE_NON_LIT_ALLOWED = conf("spark.rapids.sql.test.isFoldableNonLitAllowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need this at all. The foldable check was put in place a long time ago when each expression had to have explicit support for scalar values. We refectored the code a while ago and removed that. Just the fact that it works for testing means that the refactor was correct and I think we can just delete the check entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I do it in this PR or in another separate one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with doing it later as this is an internal config, and it might be a large-ish change. But at the same time I don't want to forget about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this issue will be tracked in #10887
@@ -121,6 +121,12 @@ object RapidsSQLTestsBaseTrait { | |||
"org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback") | |||
.set("spark.sql.warehouse.dir", warehouse) | |||
.set("spark.sql.cache.serializer", "com.nvidia.spark.ParquetCachedBatchSerializer") | |||
.set("spark.sql.session.timeZone", "UTC") | |||
.set("spark.rapids.sql.explain", "ALL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make the tests very verbose. Intentional, or debug code accidentally left in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are Intentional.
-
For "spark.sql.session.timeZone", we find many expressions won't go to GPU routine unless timezone is set to UTC. Since we want to test as many test cases as possible in GPU, we'd better use UTC timezone
-
For "spark.rapids.sql.explain", since by default tests/src/test/resources/log4j2.properties already set console appender's log level to error, we won't see verbose logs even if spark.rapids.sql.explain is set to ALL here. The benifit of setting it to ALL is, when we want to check&debug test cases, we only need to change log4j2.proerties, instead of changing both log4j2.proerties and adding
.set("spark.rapids.sql.explain", "ALL")
in code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we want to test as many test cases as possible in GPU, we'd better use UTC timezone
I'm not questioning that we need to control the timezone for these tests, but rather the discussion is about how to control the timezone. Normally we don't want hardcoding like this, because now the test cannot check if we're doing the right thing in any timezone other that UTC, and like I said, we do have some timezone support with more on the way. Hardcoding this precludes using these tests to verify we're doing the right thing in any other timezone. Having the test environment setup script control that makes the test reusable.
@@ -121,6 +121,12 @@ object RapidsSQLTestsBaseTrait { | |||
"org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback") | |||
.set("spark.sql.warehouse.dir", warehouse) | |||
.set("spark.sql.cache.serializer", "com.nvidia.spark.ParquetCachedBatchSerializer") | |||
.set("spark.sql.session.timeZone", "UTC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to force UTC in the tests? We have partial support for timezones and it's likely more will be added later. Typically what we've done in the past is leave it up to the environment setup to specify the timezone desired to be tested (e.g.: CI scripts will set the TZ environment variable or other JVM settings to setup the timezone) rather than have the tests smash the timezone directly. That way the tests can be run across multiple timezones by changing the environment before running the tests.
Comment applies to a couple of other places below where the timezone is being smashed with UTC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the consideration of spark.sql.session.timeZone is explained in previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GaryShen2008 told me that they're already suggesting our customer to set timezone to UTC to facilitate GPU execution. So at least we're not testing a fictitious scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should enable non-UTC test cases later, but in the first step, setting UTC here is to enable us testing in a fixed config which we support now. It can help us to detect the failure in UTC case. Without this UTC setting, the test cases may pass because of fallback to CPU. Currently we haven't been able to detect the wrong fallback in UT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we're not setting UTC in the environment of the test setup, which lets us reuse these tests. If we're going to leave this as-is, there needs to be a TODO comment linking to a tracking issue to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a TODO issue for this. #10874, will add it to code's comment.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestsTrait.scala
Outdated
Show resolved
Hide resolved
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestsTrait.scala
Outdated
Show resolved
Hide resolved
class RapidsJsonExpressionsSuite extends JsonExpressionsSuite with RapidsTestsTrait { | ||
override def beforeAll(): Unit = { | ||
super.beforeAll() | ||
SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and elsewhere. compile constant
SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", "true") | |
SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", true.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gerashegalov , I will change accordingly. However it seems not generally enforced by everyone? One anti-pattern example at https://github.com/NVIDIA/spark-rapids/blob/branch-24.06/tests/src/test/scala/com/nvidia/spark/rapids/ApproximatePercentileSuite.scala#L108
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, we can add an issue to enforce it via scalastyle
override def afterAll(): Unit = { | ||
super.afterAll() | ||
SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonTuple") | ||
SQLConf.get.unsetConf("spark.rapids.sql.expression.GetJsonObject") | ||
SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonToStructs") | ||
SQLConf.get.unsetConf("spark.rapids.sql.expression.StructsToJson") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting into a base trait and reuse conf handling across unit tests
spark-rapids/tests/src/test/scala/com/nvidia/spark/rapids/SparkQueryCompareTestSuite.scala
Lines 158 to 161 in 56de325
override def afterAll(): Unit = { | |
super.afterAll() | |
TrampolineUtil.cleanupAnyExistingSession() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
1c39df7
to
92fb3c9
Compare
build |
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
92fb3c9
to
7237cb6
Compare
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, but should get feedback from @revans2 and @gerashegalov as well.
@@ -1485,6 +1485,13 @@ val GPU_COREDUMP_PIPE_PATTERN = conf("spark.rapids.gpu.coreDump.pipePattern") | |||
.stringConf | |||
.createWithDefault(false.toString) | |||
|
|||
val FOLDABLE_NON_LIT_ALLOWED = conf("spark.rapids.sql.test.isFoldableNonLitAllowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with doing it later as this is an internal config, and it might be a large-ish change. But at the same time I don't want to forget about it.
override def afterAll(): Unit = { | ||
SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonTuple") | ||
SQLConf.get.unsetConf("spark.rapids.sql.expression.GetJsonObject") | ||
SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonToStructs") | ||
SQLConf.get.unsetConf("spark.rapids.sql.expression.StructsToJson") | ||
super.afterAll() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not how I thought my previous comment would be addressed. Should not beforeAll and afterAll largely be the same implementation as in
other Unit Tests based off SparkQueryCompareTestSuite
especially the cleanup after?
override def afterAll(): Unit = {
super.afterAll()
TrampolineUtil.cleanupAnyExistingSession()
}
And we should just make sure that we reuse for new UT and
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RapidsJsonConfTrait is an extra trait to inject more json related configs, and it will call the beforeAll/afterAll methods in its parent traits ( because of trait linerlization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am talking about the fact that we already have the code cleaning out SparkSession via TrampolineUtil.cleanupAnyExistingSession()
. Instead of doing a special logic for this one test we could reuse the more general cleanup code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with Gera offline, we'll use #10886 to track this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. the comment about conf cleanup between the tests is optional
So far all of the comment of this issue is addressed. I'll close this PR and turn to #10861 as it is a superset of this PR |
contributes to #10850
changes in this PR: