Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default value for SPARK_VERSION and spark_session #170

Open
aagumin opened this issue Nov 1, 2023 · 1 comment
Open

default value for SPARK_VERSION and spark_session #170

aagumin opened this issue Nov 1, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@aagumin
Copy link

aagumin commented Nov 1, 2023

Hi!
Why can't specifying the Pyspark version in environment variables be optional?
like this

import os
import pyspark
os.environ['SPARK_VERSION'] = str(pyspark.__version__)

or

@lru_cache(maxsize=None)
def _get_spark_version() -> str:
    try:
        spark_version = os.environ.get("SPARK_VERSION")
        if not spark_version:
            spark_version = str(pyspark.__version__)
    except KeyError:
        raise RuntimeError(f"SPARK_VERSION environment variable is required. Supported values are: {SPARK_TO_DEEQU_COORD_MAPPING.keys()}")

    return _extract_major_minor_versions(spark_version)

And default spark version ...
like this

class VerificationResult:
    """The results returned from the VerificationSuite
    :param verificationRunBuilder verificationRun:  verification result run()
    """

    def __init__(self, spark_session: Optional[SparkSession], verificationRun):
        self._spark_session = self._setup_spark_session(spark_session)
        self.verificationRun = verificationRun

    def _setup_spark_session(self, session=None):
        if session:
            return session
        potencial_session = SparkSession.getActiveSession()
        if potencial_session:
            return potencial_session
        else:
            msg = "Spark session not found, init with `VerificationResult(my_session, ...)`"
            raise AttributeError(msg)

maybe i can do this and create PR?

@aagumin aagumin changed the title default value to SPARK_VERSION default value to SPARK_VERSION and spark_session Nov 1, 2023
@aagumin aagumin changed the title default value to SPARK_VERSION and spark_session default value for SPARK_VERSION and spark_session Nov 1, 2023
@chenliu0831
Copy link
Contributor

chenliu0831 commented Nov 3, 2023

There were some efforts to get runtime Spark version implicitly/automatically before, but it was reverted back because it caused issues #111. You could take a look at what happened with the last attempt.

We would welcome ideas and PR to simplify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants