Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50157][SQL] Using SQLConf provided by SparkSession first. #48693

Closed
wants to merge 3 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Oct 29, 2024

What changes were proposed in this pull request?

This PR proposes to use SQLConf provided by SparkSession first.

Why are the changes needed?

SQLConf provided by SparkSession have better perf than SQLConf.get.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

GA tests.
The benchmark test.

object SQLConfBenchmark extends SqlBasedBenchmark {
  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    runBenchmark("Get SQLConf") {
      val iters = 1000
      val benchmark = new Benchmark("Benchmark SQLConf", iters, output = output)
      benchmark.addCase("SQLConf.get") { _ =>
        for (_ <- 1 to iters) {
          val conf = SQLConf.get
        }
      }

      benchmark.addCase("sessionState.conf") { _ =>
        for (_ <- 1 to iters) {
          val conf = spark.sessionState.conf
        }
      }

      benchmark.run()
    }
  }
}

The benchmark output.

Benchmark SQLConf:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
SQLConf.get                                           0              0           1         22.9          43.6       1.0X
sessionState.conf                                     0              0           0       1377.4           0.7      60.1X

Was this patch authored or co-authored using generative AI tooling?

'No'.

@github-actions github-actions bot added the SQL label Oct 29, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you can provide some concrete examples or numbers with this, @beliefer .

In fact, SQLConf.get and spark.sessionState.conf are functionally equivalent. But spark.sessionState.conf is faster than SQLConf.get because the latter requires visit thread local. So we can see the usage scenarios:

  1. The code rely on SQLConfHelper in Spark we can't find SparkSession in context.
  2. All the places in Spark rely on spark.sessionState.conf if exists SparkSession.

It was hard to provide some perf number for me, the perf improvement looks very inconspicuous after all.

@@ -231,7 +231,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
createQueryTest("no from clause",
"SELECT 1, +1, -1")

if (!conf.ansiEnabled) {
if (!spark.sessionState.conf.ansiEnabled) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly revert all modifications made to this test suite, as the test code is inconsequential to the performance evaluation.

@beliefer
Copy link
Contributor Author

beliefer commented Nov 1, 2024

@dongjoon-hyun Please take a look again. Thanks.

@HyukjinKwon
Copy link
Member

How much does it get improved?

@beliefer
Copy link
Contributor Author

beliefer commented Nov 1, 2024

How much does it get improved?

The benchmark added into PR's description.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR technically proposed to avoid the use of SQLConfHelper.conf method, right?

If that method is useless, can we remove the method? Or, when do we need to use that method still?

@beliefer
Copy link
Contributor Author

beliefer commented Nov 3, 2024

If that method is useless, can we remove the method? Or, when do we need to use that method still?

AFAIK, we need SQLConfHelper.conf in Spark if we can't find SparkSession in context.

@HyukjinKwon
Copy link
Member

Does the benchmark result say session.conf is slower than SQLConf.get?

@beliefer
Copy link
Contributor Author

beliefer commented Nov 4, 2024

Does the benchmark result say session.conf is slower than SQLConf.get?

session.conf is quicker than SQLConf.get.

@cloud-fan
Copy link
Contributor

cloud-fan commented Nov 11, 2024

In general I think it's better to use the SQLConf in SparkSession if it's available. Maybe we should just override def conf in those rules that have SparkSession available

@beliefer
Copy link
Contributor Author

@cloud-fan override def conf finished, do you have any other comments?

@beliefer
Copy link
Contributor Author

beliefer commented Dec 5, 2024

@dongjoon-hyun @HyukjinKwon The GA failure seems unrelated. Please take a look again.

@@ -33,6 +33,8 @@ import org.apache.spark.util.Utils
*/
case class CoalesceShufflePartitions(session: SparkSession) extends AQEShuffleReadRule {

override def conf: SQLConf = session.sessionState.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change it to override val conf: SQLConf = session.sessionState.conf in cases where conf needs to be called multiple times?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the test results, there are other issues with changing to val. It seems we need to revert the last commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for misleading

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@beliefer beliefer closed this in 819bac9 Dec 13, 2024
@beliefer
Copy link
Contributor Author

@LuciferYang @dongjoon-hyun @cloud-fan @HyukjinKwon Thank you all.

ericm-db pushed a commit to ericm-db/spark that referenced this pull request Dec 17, 2024
### What changes were proposed in this pull request?
This PR proposes to use `SQLConf` provided by `SparkSession` first.

### Why are the changes needed?
`SQLConf` provided by `SparkSession` have better perf than `SQLConf.get`.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
GA tests.
The benchmark test.
```
object SQLConfBenchmark extends SqlBasedBenchmark {
  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    runBenchmark("Get SQLConf") {
      val iters = 1000
      val benchmark = new Benchmark("Benchmark SQLConf", iters, output = output)
      benchmark.addCase("SQLConf.get") { _ =>
        for (_ <- 1 to iters) {
          val conf = SQLConf.get
        }
      }

      benchmark.addCase("sessionState.conf") { _ =>
        for (_ <- 1 to iters) {
          val conf = spark.sessionState.conf
        }
      }

      benchmark.run()
    }
  }
}
```
The benchmark output.
```
Benchmark SQLConf:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
SQLConf.get                                           0              0           1         22.9          43.6       1.0X
sessionState.conf                                     0              0           0       1377.4           0.7      60.1X
```

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes apache#48693 from beliefer/SPARK-50157.

Authored-by: beliefer <[email protected]>
Signed-off-by: beliefer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants