-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: parse_wheel_filename
permits non-normalized versions.
#873
Comments
Does packaging have any sort of deprecation process for this sort of change? i.e. a public function that throws an error where it didn't used to. Pip is migrating to use I'm still learning how pip migrates to new packaging versions, I guess we could patch packaging to throw a warning instead of an exception for an appropriate deprecation period. |
@notatallshaw You may be interested in this branch I am working on to provide a better API here around creating/parsing filenames, including "strict" and "non-strict" parsing: main...di:packaging:filename-api From a cursory glance, it appears that most build backends are producing wheels with normalized filenames, so I wouldn't expect this to be too disruptive, but I could also be convinced to leave |
We can raise a warning for a while if an analysis of PyPI suggests it would be disruptive. |
I don't want to hold up packaging if this is a clear win for most use cases. But a bit of background: Currently pip is using a custom regex to parse wheel filenames: https://github.com/pypa/pip/blob/25.0.1/src/pip/_internal/models/wheel.py#L21 There was a use case pip explicitly supported where These two use cases suggest the two scenarios are users hand rolling their own wheel filenames on private indexes / filesystems and perhaps an edge case in In general when pip is consuming wheel filenames I would rather it be non-strict in cases where the filename can be unambiguously normalized. But I also suppose we could lengthen the depreciation cycle and flag all issues to users (outside the explicitly supported |
Taking a look at all filenames uploaded in the last month: COPY (
SELECT
fe.time AS timestamp,
fe.additional->>'filename' AS filename
FROM
file_events fe
WHERE
fe.time >= NOW() - INTERVAL '1 month'
AND fe.tag = 'file:add'
AND fe.additional->>'filename' LIKE '%.whl'
ORDER BY fe.time
) TO STDOUT WITH (FORMAT CSV, HEADER); Analyzing with: import csv
from packaging.utils import InvalidWheelFilename, parse_wheel_filename
total_filenames = 0
exception_count = 0
with open("out.csv") as file:
reader = csv.DictReader(file) # Assumes CSV has a header
for row in reader:
total_filenames += 1
filename = row["filename"]
try:
parse_wheel_filename(filename)
except InvalidWheelFilename:
exception_count += 1
percentage_exceptions = (exception_count / total_filenames) * 100
print(f"Total filenames processed: {total_filenames}")
print(f"Total exceptions: {exception_count}")
print(f"Percentage of exceptions: {percentage_exceptions:.2f}%") With this branch, we get:
The invalid filenames in question are:
So it's really only two projects that are producing wheels with non-normalized versions, not 24 projects. I can expand the analysis to a larger timeframe if anyone would like? |
Also, seems reasonable to add a |
While helpful information, my experience supporting pip is that it's users who are not uploading to PyPI that are caught by changes to more strictly comply with the spec.
I think this would make it easier, and I definitely like the look of the filename API you linked earlier. All that said, please don't consider it a show stopper, looking at all this code has made me think I need to re-review this and check additional scenarios before migrating to |
If the migration hasn't started yet, I would recommend waiting until we can get a better API for parsing filenames into this library. Note that |
We started the depreciation in pip 24.3 due to be complete in pip 25.1 (the next release), I'll discuss with the other pip maintainers, thanks.
I've not looked at sdists in general yet, that seemed like a much bigger can of worms. The only place I think we're using |
What API are you thinking of? |
The one I mentioned here:
|
I'm also OK waiting on that (or something like that) to land instead of merging #874, we can add this check to PyPI in the interim instead. |
https://packaging.python.org/en/latest/specifications/binary-distribution-format/ says:
Currently,
parse_wheel_filename
will raiseInvalidWheelFilename
for some invalid filenames, but https://packaging.python.org/en/latest/specifications/version-specifiers/#normalization has a long list of version normalizations thatparse_wheel_filename
does not enforce:The text was updated successfully, but these errors were encountered: