Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode character in README causes UnicodeEncodeError for 0.9.0b1 #149

Closed
woutdenolf opened this issue Sep 13, 2024 · 28 comments
Closed

Unicode character in README causes UnicodeEncodeError for 0.9.0b1 #149

woutdenolf opened this issue Sep 13, 2024 · 28 comments

Comments

@woutdenolf
Copy link

woutdenolf commented Sep 13, 2024

Python 3.7.17 (still supported):

pip install -U --pre meson-python meson ninja cython
pip install fabio --no-build-isolation --no-cache

This installs pyproject-metadata-0.9.0b1 which gives the following error

Traceback (most recent call last):
File ".../python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
  main()
File ".../python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
  json_out['return_val'] = hook(**hook_input['kwargs'])
File ".../python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
  whl_basename = backend.build_wheel(metadata_directory, config_settings)
File ".../python3.7/site-packages/mesonpy/__init__.py", line 1020, in wrapper
  return func(*args, **kwargs)
File ".../python3.7/site-packages/mesonpy/__init__.py", line 1074, in build_wheel
  return project.wheel(out).name
File ".../python3.7/site-packages/mesonpy/__init__.py", line 925, in wheel
  return builder.build(directory)
File ".../python3.7/site-packages/mesonpy/__init__.py", line 463, in build
  self._wheel_write_metadata(whl)
File ".../python3.7/site-packages/mesonpy/__init__.py", line 451, in _wheel_write_metadata
  whl.writestr(f'{self._distinfo_dir}/METADATA', bytes(self._metadata.as_rfc822()))
File ".../python3.7/email/message.py", line 164, in __bytes__
  return self.as_bytes()
File ".../python3.7/email/message.py", line 178, in as_bytes
  g.flatten(self, unixfrom=unixfrom)
File ".../python3.7/email/generator.py", line 116, in flatten
  self._write(msg)
File ".../python3.7/email/generator.py", line 181, in _write
  self._dispatch(msg)
File ".../python3.7/email/generator.py", line 214, in _dispatch
  meth(msg)
File ".../python3.7/email/generator.py", line 432, in _handle_text
  super(BytesGenerator,self)._handle_text(msg)
File ".../python3.7/email/generator.py", line 249, in _handle_text
  self._write_lines(payload)
File ".../python3.7/email/generator.py", line 155, in _write_lines
  self.write(line)
File ".../python3.7/email/generator.py", line 406, in write
  self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 94: ordinal not in range(128)
[end of output]

An "ø" character in the README.rst file of the fabio project causes the UnicodeEncodeError error.

This works fine:

pip install "pyproject-metadata==0.8.0"
pip install fabio --no-build-isolation --no-cache
@woutdenolf
Copy link
Author

The difference between 0.8.0 and 0.9.0b1 is the message class which is RFC822Message in 0.9.0b1.

@woutdenolf
Copy link
Author

Perhaps #142 ?

@woutdenolf
Copy link
Author

woutdenolf commented Sep 13, 2024

Also happens for python 3.8.19, 3.9.19 and 3.10.14 so not caused by old email.message or something

pip install --pre fabio --no-cache --no-binary fabio

@henryiii
Copy link
Collaborator

henryiii commented Sep 13, 2024

@pradyunsg or @FFY00, could you yank 0.9.0b1? I realize I'm not a maintainer on the PyPI package.

We can use https://github.com/shacklettbp/madrona/blob/3fab172cb0ad41e0b035871885c2bc4c459c520d/py/madrona-py-build.py#L34C5-L35C38 instead of compat32, but then new lines aren't supported in headers...

@dnicolodi
Copy link
Contributor

I thought pre-releases are meant for testing and that they are expected to contain bugs. If you want production quality releases, maybe do not use pre-releases? Otherwise you are kind of asking for known bugs...

@henryiii
Copy link
Collaborator

This wasn't a known bug. I would't make a prerelease with a known bug. But you are opting in to helping us test untested code. ;)

@dnicolodi
Copy link
Contributor

I meant to say that it is known that very likely pre-releases contain bugs, not that the specific bugs are know

@henryiii
Copy link
Collaborator

It just sounded like you were discouraging reporting on bugs in beta releases, when this is exactly what I wanted from the prerelease, bug reports. :)

@dnicolodi
Copy link
Contributor

I was objecting to requesting to yank the pre-release because it contains a bug. I think it works against the goal of having the pre-release in the first place. Of course the bug report is well accepted! Thank you for testing and reporting back @woutdenolf ! I'm sorry if I made it sound like your report was not appreciated. It was not my intention.

@henryiii
Copy link
Collaborator

henryiii commented Sep 13, 2024

Ah, yes. Didn't know it was about the yanking. I like yanking anything with a known bug of this size (no unicode); I agree it's not critically required, though. You can still install it after yanking, so backend authors could still play with it, though I plan to make a b2 with a fix soon.

@woutdenolf
Copy link
Author

In the CI of all our projects we use --pre precisely to catch upstream bugs like this before they go in production. We always struggle however when this happens. Essentially we have four options:

  • remove --pre from the CI of all our affected projects until the upstream project (pyproject-metadata in this case) creates a new (pre-)release with a fix
  • live with a broken CI until the upstream project creates a new (pre-)release with a fix
  • modify the dependencies in all our broken projects to include pyproject-metadata!=0.9.0b1 even when the project does not directly depend on it and remove it when the upstream project creates a new (pre-)release with a fix
  • ask the buggy pre-release to be yanked

As you can see the last option would be easier for downstream projects. But I understand you want the buggy pre-release to stay so other projects can report other bugs until you deployed a fix, after which yanking or not does not matter anymore. So up to you. Thanks for the fast response!

@eli-schwartz
Copy link

There are two types of projects:

  • the type where a beta version changes tons of stuff across a huge usage surface, and 500 different projects are depending on and using it 50 different ways, of which 30 projects are broken because of two bugs
  • the type where there is mostly just one way to use the project and a couple new features were added, and most projects using it are going to hit the issue

In the former case, I would say that yanking a beta is very unproductive. It's perhaps unfortunate for the handful of projects with the issue, but a good balance here may be to have two CI jobs, one of which tests pre-releases, and then just... resign yourself to a red X-mark in your CI statuses for a short while?

In the latter case, yanking usually makes sense because the bug is kind of a showstopper and projects which cannot build due to one bug aren't going to find a different bug at the same time; yanking the release keeps things less noisy for pre-release CI and means that those projects are more likely to notice it's still broken once you release another beta with a fix.

The question is always, which one do you think your project is? :D

My gut feeling is that in this case the answer is "probably" number 2.

@henryiii
Copy link
Collaborator

Can you verify if 0.9.0 beta 2 fixes this?

@henryiii
Copy link
Collaborator

I think it depends entirely on the breakage: if it's something that was supposed to break then you should not yank a beta. That's a perfect chance for people to adapt to the change, and it's great they found it early. But if it was not supposed to break, then you should yank it, because you don't want people to adapt to the change, it wasn't intentional!

@eli-schwartz
Copy link

eli-schwartz commented Sep 13, 2024

I should probably have also added "and depending on how easy it is to roll out a new beta". Depending on the project it could be something that is only practical to do once a week.

I don't think people will "adapt" to a runtime crash either way.

@henryiii
Copy link
Collaborator

henryiii commented Sep 13, 2024

(Also, yes, it can depend on how likely projects using it are affected by the change, too; this was any project using unicode, which I think was large enough to be problematic).

And yes, how easy it is to ship. A pure Python project is pretty easy. I'm not worrying about getting signoff from a majority of the maintainers for beta releases like I will for the full release.

You could adapt by removing the unicode, I suppose. :P

@henryiii
Copy link
Collaborator

henryiii commented Sep 13, 2024

Testing it locally:

uv venv
uv pip install -U --pre meson-python meson ninja cython pip
. .venv/bin/activate.fish
pip install fabio --no-build-isolation --no-cache --no-binary fabio

And... .as_bytes() is still breaking. str(...) would work, but bytes(...) calls s.encode('ascii', 'surrogateescape').

Traceback:
      Traceback (most recent call last):
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
          whl_basename = backend.build_wheel(metadata_directory, config_settings)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 1020, in wrapper
          return func(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 1074, in build_wheel
          return project.wheel(out).name
                 ^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 925, in wheel
          return builder.build(directory)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 463, in build
          self._wheel_write_metadata(whl)
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 451, in _wheel_write_metadata
          whl.writestr(f'{self._distinfo_dir}/METADATA', bytes(self._metadata.as_rfc822()))
                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/message.py", line 194, in __bytes__
          return self.as_bytes()
                 ^^^^^^^^^^^^^^^
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/message.py", line 208, in as_bytes
          g.flatten(self, unixfrom=unixfrom)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 117, in flatten
          self._write(msg)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 182, in _write
          self._dispatch(msg)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 219, in _dispatch
          meth(msg)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 446, in _handle_text
          super(BytesGenerator,self)._handle_text(msg)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 263, in _handle_text
          self._write_lines(payload)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 156, in _write_lines
          self.write(line)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 420, in write
          self._fp.write(s.encode('ascii', 'surrogateescape'))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 94: ordinal not in range(128

@henryiii
Copy link
Collaborator

henryiii commented Sep 13, 2024

What I don't understand is why I can't reproduce this in our testsuite with a small test. Copied in pyproject.toml, copying, and README.rst and got the error to show up that way. 🤦.

Ahh, it's this character that's causing the issue: 🕵️. Ahh, it's only the payload that's causing issues (now). If there's a unicode character in the payload, then str() works, but bytes() doesn't.

@woutdenolf
Copy link
Author

Can you verify if 0.9.0 beta 2 fixes this?

As you figured out it doesn't. I appreciate the swift attempt at a fix though!

The question is always, which one do you think your project is? :D
My gut feeling is that in this case the answer is "probably" number 2

Not 500 but a few dozen projects depend on fabio I'm guessing. Not many of them have CI with --pre, after all we're talking about scientific data processing and we tend to live dangerously ;-).

Ahh, it's this character that's causing the issue: 🕵️

I thought it was this character but I might be mistaken: ø.

@henryiii
Copy link
Collaborator

In beta 1 is was ø. Now it's 🕵️, since I fixed unicode handling for bytes and str for header values, but only for str for payloads, and the readme is the payload. :)

@dnicolodi
Copy link
Contributor

@dnicolodi
Copy link
Contributor

But elsewhere it does something completely different. But setuptools being the stratification of hacks it is, it is difficult to understand which code is actually executed...

@henryiii
Copy link
Collaborator

henryiii commented Sep 13, 2024

As far as I can tell, the email module doesn't support anything other than hardcoded ascii for its BinaryGenerator when it's dealing with payloads. I don't see how to get around https://github.com/python/cpython/blob/84e3f6078b5bab7d2d6413de6b089013d4aa4aa0/Lib/email/generator.py#L419-L420. The docs for utf8=True say it is for header contents.

It gets used here: https://github.com/python/cpython/blob/84e3f6078b5bab7d2d6413de6b089013d4aa4aa0/Lib/email/message.py#L196-L209

Otherwise, this almost works perfectly with EmailMessage:

class MetadataPolicy(email.policy.EmailPolicy):
    utf8 = True
    mangle_from_ = False
    max_line_length = 0

    def header_store_parse(self, name: str, value: str) -> tuple[str, str]:
        size = len(name) + 2
        value = value.replace('\n', '\n' + ' ' * size)
        return (name, value)

@henryiii
Copy link
Collaborator

Seems to be fine now with 0.9.0b3? 🤞

@henryiii
Copy link
Collaborator

Please close if you can verify. But my little command sequence above passes, at least. :)

@woutdenolf
Copy link
Author

woutdenolf commented Sep 13, 2024

I can confirm 0.9.0b3 solved this issue. You made my day @henryiii ! Nothing better than seeing a wall of red turn green before the weekend. Much appreciated!

@henryiii
Copy link
Collaborator

Thanks for testing pre-releases!

@pradyunsg
Copy link
Member

could you yank 0.9.0b1?

Done.

Screenshot 2024-09-15 at 09 46 58

I've also invited you to the project @henryiii.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants