Handle the large_pex case when running from an unzipped pex #120

jcuquemelle · 2024-03-13T08:59:04Z

This happens when we run python code from an unzipped large pex, and the code that is run will itself try and rebuild a large (zipped) pex (e.g. by launching a spark job that will call cluster_pack.upload_env)

The detect_archive_name function must be aware that it is currently running from an unzipped pex in order to correctly retrieve the original zipped pex name

This happens when we run python code from a large zipped pex, and the code that is run will itself try and rebuild a large (zipped) pex (e.g. by launching a spark job that will call `cluster_pack.upload_env`) The detect_archive_name function must be aware that it is currently running from an unzipped pex in order to correctly retrieve the original zipped pex name

ax-vivien · 2024-03-13T12:26:56Z

cluster_pack/packaging.py

+    def build_package_path(name: str = env_name,
+                           extension: Optional[str] = packer.extension()) -> str:
+        path = (f"{get_default_fs()}/user/{getpass.getuser()}"
+                f"/envs/{name}")
+        if extension is None:
+            return path
+        return f"{path}.{extension}"


Def this func outside detect_archive_names for better readability

ax-vivien · 2024-03-13T12:27:42Z

cluster_pack/packaging.py

    else:
        if "".join(os.path.splitext(package_path)[1]) != f".{packer.extension()}":
            raise ValueError(f"{package_path} has the wrong extension"
                             f", .{packer.extension()} is expected")

+    # we are actually building or reusing a large pex and we have the information from the
+    # allow_large_pex flag
    if (packer.extension() == PEX_PACKER.extension()


packer.extension() is used at multiple places, put it in a var

ax-vivien · 2024-03-13T12:34:20Z

cluster_pack/packaging.py

+        pex_files = glob.glob(f"{os.path.dirname(pex_file)}/*.pex.zip")
+        assert len(pex_files) == 1, \
+            f"Expected to find single zipped PEX in same dir as {pex_file}, got {pex_files}"
+        package_path = build_package_path(os.path.basename(pex_files[0]), None)


So if I understand this should be the path on hdfs of the large pex created previously in the workflow ?

No, this is the path on the local filesystem of the zip (the current executable is the main.py in the unzipped pex)

ax-vivien · 2024-03-13T12:36:11Z

cluster_pack/packaging.py

+        if extension is None:
+            return path
+        return f"{path}.{extension}"
+
    if not package_path:


if package_path is provided I feel like it can be changed in other if, is it expected ?

yes, if package_path is provided, we can add a .zip at the end if large_pex is true. This was the previous behavior, the code_path that is added here is for the case when large_pex is None but we detect we are running from unzipped, and we don't change the previous behavior

* Handle the large_pex case when running from an unzipped pex This happens when we run python code from a large zipped pex, and the code that is run will itself try and rebuild a large (zipped) pex (e.g. by launching a spark job that will call `cluster_pack.upload_env`) The detect_archive_name function must be aware that it is currently running from an unzipped pex in order to correctly retrieve the original zipped pex name * fix lint * fix lib versions for tests * fix glob mock return value * Fix dependency version for python < 3.8 * Address comments

jcuquemelle added 5 commits March 12, 2024 19:53

fix lint

52e72ca

fix lib versions for tests

d37814b

fix glob mock return value

c0e3605

Fix dependency version for python < 3.8

002fec1

ax-vivien reviewed Mar 13, 2024

View reviewed changes

Address comments

9f5ac72

ax-vivien approved these changes Mar 13, 2024

View reviewed changes

jcuquemelle merged commit c30b55f into master Mar 13, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle the large_pex case when running from an unzipped pex #120

Handle the large_pex case when running from an unzipped pex #120

jcuquemelle commented Mar 13, 2024

ax-vivien Mar 13, 2024

jcuquemelle Mar 13, 2024

ax-vivien Mar 13, 2024

jcuquemelle Mar 13, 2024

ax-vivien Mar 13, 2024

jcuquemelle Mar 13, 2024

ax-vivien Mar 13, 2024

jcuquemelle Mar 13, 2024

Handle the large_pex case when running from an unzipped pex #120

Handle the large_pex case when running from an unzipped pex #120

Conversation

jcuquemelle commented Mar 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment