-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50157][SQL] Using SQLConf provided by SparkSession first. #48693
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you can provide some concrete examples or numbers with this, @beliefer .
In fact, SQLConf.get
and spark.sessionState.conf
are functionally equivalent. But spark.sessionState.conf
is faster than SQLConf.get
because the latter requires visit thread local. So we can see the usage scenarios:
- The code rely on
SQLConfHelper
in Spark we can't findSparkSession
in context. - All the places in Spark rely on
spark.sessionState.conf
if existsSparkSession
.
It was hard to provide some perf number for me, the perf improvement looks very inconspicuous after all.
@@ -231,7 +231,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd | |||
createQueryTest("no from clause", | |||
"SELECT 1, +1, -1") | |||
|
|||
if (!conf.ansiEnabled) { | |||
if (!spark.sessionState.conf.ansiEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly revert all modifications made to this test suite, as the test code is inconsequential to the performance evaluation.
@dongjoon-hyun Please take a look again. Thanks. |
How much does it get improved? |
The benchmark added into PR's description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR technically proposed to avoid the use of SQLConfHelper.conf
method, right?
If that method is useless, can we remove the method? Or, when do we need to use that method still?
AFAIK, we need |
Does the benchmark result say |
|
In general I think it's better to use the |
fc15ef3
to
db65e8b
Compare
@cloud-fan |
db65e8b
to
d41be9d
Compare
@dongjoon-hyun @HyukjinKwon The GA failure seems unrelated. Please take a look again. |
@@ -33,6 +33,8 @@ import org.apache.spark.util.Utils | |||
*/ | |||
case class CoalesceShufflePartitions(session: SparkSession) extends AQEShuffleReadRule { | |||
|
|||
override def conf: SQLConf = session.sessionState.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change it to override val conf: SQLConf = session.sessionState.conf
in cases where conf
needs to be called multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the test results, there are other issues with changing to val
. It seems we need to revert the last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for misleading
1764de8
to
d02ef58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
d02ef58
to
79a9445
Compare
@LuciferYang @dongjoon-hyun @cloud-fan @HyukjinKwon Thank you all. |
### What changes were proposed in this pull request? This PR proposes to use `SQLConf` provided by `SparkSession` first. ### Why are the changes needed? `SQLConf` provided by `SparkSession` have better perf than `SQLConf.get`. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? GA tests. The benchmark test. ``` object SQLConfBenchmark extends SqlBasedBenchmark { override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { runBenchmark("Get SQLConf") { val iters = 1000 val benchmark = new Benchmark("Benchmark SQLConf", iters, output = output) benchmark.addCase("SQLConf.get") { _ => for (_ <- 1 to iters) { val conf = SQLConf.get } } benchmark.addCase("sessionState.conf") { _ => for (_ <- 1 to iters) { val conf = spark.sessionState.conf } } benchmark.run() } } } ``` The benchmark output. ``` Benchmark SQLConf: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ SQLConf.get 0 0 1 22.9 43.6 1.0X sessionState.conf 0 0 0 1377.4 0.7 60.1X ``` ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes apache#48693 from beliefer/SPARK-50157. Authored-by: beliefer <[email protected]> Signed-off-by: beliefer <[email protected]>
What changes were proposed in this pull request?
This PR proposes to use
SQLConf
provided bySparkSession
first.Why are the changes needed?
SQLConf
provided bySparkSession
have better perf thanSQLConf.get
.Does this PR introduce any user-facing change?
'No'.
How was this patch tested?
GA tests.
The benchmark test.
The benchmark output.
Was this patch authored or co-authored using generative AI tooling?
'No'.