-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle the large_pex case when running from an unzipped pex #120
Conversation
This happens when we run python code from a large zipped pex, and the code that is run will itself try and rebuild a large (zipped) pex (e.g. by launching a spark job that will call `cluster_pack.upload_env`) The detect_archive_name function must be aware that it is currently running from an unzipped pex in order to correctly retrieve the original zipped pex name
cluster_pack/packaging.py
Outdated
def build_package_path(name: str = env_name, | ||
extension: Optional[str] = packer.extension()) -> str: | ||
path = (f"{get_default_fs()}/user/{getpass.getuser()}" | ||
f"/envs/{name}") | ||
if extension is None: | ||
return path | ||
return f"{path}.{extension}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Def this func outside detect_archive_names for better readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cluster_pack/packaging.py
Outdated
else: | ||
if "".join(os.path.splitext(package_path)[1]) != f".{packer.extension()}": | ||
raise ValueError(f"{package_path} has the wrong extension" | ||
f", .{packer.extension()} is expected") | ||
|
||
# we are actually building or reusing a large pex and we have the information from the | ||
# allow_large_pex flag | ||
if (packer.extension() == PEX_PACKER.extension() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packer.extension() is used at multiple places, put it in a var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cluster_pack/packaging.py
Outdated
pex_files = glob.glob(f"{os.path.dirname(pex_file)}/*.pex.zip") | ||
assert len(pex_files) == 1, \ | ||
f"Expected to find single zipped PEX in same dir as {pex_file}, got {pex_files}" | ||
package_path = build_package_path(os.path.basename(pex_files[0]), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand this should be the path on hdfs of the large pex created previously in the workflow ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is the path on the local filesystem of the zip (the current executable is the main.py in the unzipped pex)
if extension is None: | ||
return path | ||
return f"{path}.{extension}" | ||
|
||
if not package_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if package_path is provided I feel like it can be changed in other if, is it expected ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if package_path is provided, we can add a .zip at the end if large_pex is true. This was the previous behavior, the code_path that is added here is for the case when large_pex is None but we detect we are running from unzipped, and we don't change the previous behavior
* Handle the large_pex case when running from an unzipped pex This happens when we run python code from a large zipped pex, and the code that is run will itself try and rebuild a large (zipped) pex (e.g. by launching a spark job that will call `cluster_pack.upload_env`) The detect_archive_name function must be aware that it is currently running from an unzipped pex in order to correctly retrieve the original zipped pex name * fix lint * fix lib versions for tests * fix glob mock return value * Fix dependency version for python < 3.8 * Address comments
This happens when we run python code from an unzipped large pex, and the code that is run will itself try and rebuild a large (zipped) pex (e.g. by launching a spark job that will call
cluster_pack.upload_env
)The detect_archive_name function must be aware that it is currently running from an unzipped pex in order to correctly retrieve the original zipped pex name