-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solver should report exact package hash that was used to install a package #5102
Comments
/assign @Gregory-Pereira |
/priority important-soon |
I am not too familiar with solver, so pardon my questions as I get up to speed. When you say |
Check TensorFlow wheels published on PyPI as an example - https://pypi.org/project/tensorflow/2.7.0/#files There are macos, windows and manylinux builds specific for some Python version (ex. Python 3.7, 3.8, 3.9). As of now we point users to tensorflow==2.7.0 from PyPI and provide all the artifact hashes (so that pip picks the right build on client side). In an ideal scenario, Thoth should give back just one hash pointing to specific artifact that should be used to install tensorflow==2.7.0. That can be, for example, tensorflow-2.7.0-cp39-cp39-manylinux2010_x86_64.whl if users run linux and use Python 3.9 (and x86_64 arch). |
So we would save a bunch of these hashes that correspond to the a specific version of a package, the OS and python version on the Thoth server / API side? I guess whats confusing me about this is how does that fit in with the rest of resolver? For instance my understanding is that solver allows you to pass in some package version and constraints. ex: cycler>=0.10.
kiwisolver==1.2.
matplotlib<=3.2.1
numpy==1.18.5 (note this example is completely made up I don't know if there is a solution for these dependencies / versions). It will then recursively resolve all dependencies and transitive dependencies that would work for these rules. So I understand how that would work if we are looking a specified package version, but what about |
We already have them on Thoth server side (Thoth is a cloud/server side resolver). The think is that we miss the OS+python version linkage.
The resolver is using tenporal difference learning (so no "recursive tries" per say). We use this "solver" component to aggregate information about packages for the resolver itself - so solver will just get corresponding hashes more accuratelly that are subsequently used by the server-side resolver. |
So for each dependency as its getting installed I am able to grab its SHA256. However the way pip does its hashes is per file in said package. Now not all files in a package may have a SHA. I stuck with selinon as one of my examples, when in pipenv shell I ran: Located in this folder there were two folders related to selinon,
I thought there would be a way to grab a single hash for a package, but Im not sure I am looking in the right place, maybe this would be located somewhere on PYPI, but I haven't found it yet. Maybe I will need to save all the individual hashes or import some other library or package to use such as I plan to use this to build this out on the result object:
It will have this format as an object with a key of <package_name>--<system_platform>-<system_release>-<platform_architecture>, and value of the package SHA Let me know if I am missing or misunderstanding anything. |
Nice research. Sadly, these hashes will not be part of the artifacts as the artifact hash is computed based on the artifact content, which makes it a chicken-egg problem. As of now, we obtain all the artifact hashes in this function: solver/thoth/solver/python/python.py Lines 212 to 222 in 28601d1
Ideally, thoth-solver could perform pip install for each artifact with hash:
here: solver/thoth/solver/python/python.py Line 97 in 28601d1
A brute-force approach would try all the artifacts and pip should report that the artifact is not suitable for the runtime environment. That fact can become part of the report. If the artifact is installable, thoth-solver can report its dependencies. |
So quick update. Firstly the only way I could successfully use the hashes when installing a pip package was to stuff it into some requirements file (I am using Second, of the list of SHA package hashes, sometimes multiple can actually work, like if the PYPI package provides a "thoth-wheels": {
"pyroaring-0.3.3-darwin-21.2.0-x86_64": [
{
"pyroaring-0.3.3-cp39-cp39-macosx_10_14_x86_64.whl": "399730714584ec47b05978cc00b737478a10e2a6a8fed94d886fd0b25c522b05"
},
{
"pyroaring-0.3.3.tar.gz": "232bf4cbdd7a1dad885171d9d7e59da5324b3d70c15a96a240f1319b870b46b7"
}
]
} Is this acceptable? or should I try to resolve it as only one package, and if so what criteria should be used? As for context to the next two points these are the packages and respective versions I have been using to test the selinon == 1.0.0
pyperclip == 1.8.2
pyroaring == 0.3.3
pytorch == 1.0.2
tensorflow == 2.7.0 With brute force solution to testing which SHAs work with which environment, I am running into issues for bigger packages (selinon and tesorflow). I ran my local feature branch version of the thoth-solver today in the background for Also when testing I encountered a potential issue. This was specifically for the |
Are the hashes available in the package index warehouse useful for this problem at all? See: https://pypi.org/pypi/tensorflow/json. |
moin all, any progress on this? is #5110 (comment) the blocker? @fridex could you work on it? |
So I am not sure if this is a valid solution to address Frido's comment, but I was thinking about adding the I also looked a bit into what Kevin was saying as well about the pypi package index warehouse. I am not certain this would be useful to us because we already already store the hash of every artifact per the release we are using, however we are attempting to ascertain which artifact is the best per given package, package version, and environment information (os, distro, etc.) and persist it on the Thoth side. We could take a pretty decent guess from the warehouse json for instance that for release |
@fridex is this something to move forward? /sig stack-guidance |
I was told that Thoth-Station is making a priority of stabilizing the system before introducing new changes, and so this might hang for a bit. |
Based on the history so far, my understanding is that this is |
Is your feature request related to a problem? Please describe.
Currently, Thoth provides all the artifact hashes in the lockfile that were found on the index and it lets the pip installation procedure pick the suitable artifact. Instead, Thoth should point to an exact Python artifact that should be used during the installation process to make sure proper auditing is done.
Describe the solution you'd like
The text was updated successfully, but these errors were encountered: