Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the large_pex case when running from an unzipped pex #120

Merged
merged 6 commits into from
Mar 13, 2024

Conversation

jcuquemelle
Copy link
Contributor

This happens when we run python code from an unzipped large pex, and the code that is run will itself try and rebuild a large (zipped) pex (e.g. by launching a spark job that will call cluster_pack.upload_env)

The detect_archive_name function must be aware that it is currently running from an unzipped pex in order to correctly retrieve the original zipped pex name

This happens when we run python code from a large zipped pex, and the code
that is run will itself try and rebuild a large (zipped) pex (e.g. by
launching a spark job that will call `cluster_pack.upload_env`)

The detect_archive_name function must be aware that it is currently running
from an unzipped pex in order to correctly retrieve the original zipped pex
name
Comment on lines 372 to 378
def build_package_path(name: str = env_name,
extension: Optional[str] = packer.extension()) -> str:
path = (f"{get_default_fs()}/user/{getpass.getuser()}"
f"/envs/{name}")
if extension is None:
return path
return f"{path}.{extension}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Def this func outside detect_archive_names for better readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

else:
if "".join(os.path.splitext(package_path)[1]) != f".{packer.extension()}":
raise ValueError(f"{package_path} has the wrong extension"
f", .{packer.extension()} is expected")

# we are actually building or reusing a large pex and we have the information from the
# allow_large_pex flag
if (packer.extension() == PEX_PACKER.extension()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packer.extension() is used at multiple places, put it in a var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pex_files = glob.glob(f"{os.path.dirname(pex_file)}/*.pex.zip")
assert len(pex_files) == 1, \
f"Expected to find single zipped PEX in same dir as {pex_file}, got {pex_files}"
package_path = build_package_path(os.path.basename(pex_files[0]), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this should be the path on hdfs of the large pex created previously in the workflow ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is the path on the local filesystem of the zip (the current executable is the main.py in the unzipped pex)

if extension is None:
return path
return f"{path}.{extension}"

if not package_path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if package_path is provided I feel like it can be changed in other if, is it expected ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if package_path is provided, we can add a .zip at the end if large_pex is true. This was the previous behavior, the code_path that is added here is for the case when large_pex is None but we detect we are running from unzipped, and we don't change the previous behavior

@jcuquemelle jcuquemelle merged commit c30b55f into master Mar 13, 2024
9 checks passed
jcuquemelle added a commit that referenced this pull request Sep 2, 2024
* Handle the large_pex case when running from an unzipped pex

This happens when we run python code from a large zipped pex, and the code
that is run will itself try and rebuild a large (zipped) pex (e.g. by
launching a spark job that will call `cluster_pack.upload_env`)

The detect_archive_name function must be aware that it is currently running
from an unzipped pex in order to correctly retrieve the original zipped pex
name

* fix lint

* fix lib versions for tests

* fix glob mock return value

* Fix dependency version for python < 3.8

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants