Unicode character in README causes UnicodeEncodeError for 0.9.0b1 #149

woutdenolf · 2024-09-13T13:01:21Z

Python 3.7.17 (still supported):

pip install -U --pre meson-python meson ninja cython
pip install fabio --no-build-isolation --no-cache

This installs pyproject-metadata-0.9.0b1 which gives the following error

Traceback (most recent call last):
File ".../python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
  main()
File ".../python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
  json_out['return_val'] = hook(**hook_input['kwargs'])
File ".../python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
  whl_basename = backend.build_wheel(metadata_directory, config_settings)
File ".../python3.7/site-packages/mesonpy/__init__.py", line 1020, in wrapper
  return func(*args, **kwargs)
File ".../python3.7/site-packages/mesonpy/__init__.py", line 1074, in build_wheel
  return project.wheel(out).name
File ".../python3.7/site-packages/mesonpy/__init__.py", line 925, in wheel
  return builder.build(directory)
File ".../python3.7/site-packages/mesonpy/__init__.py", line 463, in build
  self._wheel_write_metadata(whl)
File ".../python3.7/site-packages/mesonpy/__init__.py", line 451, in _wheel_write_metadata
  whl.writestr(f'{self._distinfo_dir}/METADATA', bytes(self._metadata.as_rfc822()))
File ".../python3.7/email/message.py", line 164, in __bytes__
  return self.as_bytes()
File ".../python3.7/email/message.py", line 178, in as_bytes
  g.flatten(self, unixfrom=unixfrom)
File ".../python3.7/email/generator.py", line 116, in flatten
  self._write(msg)
File ".../python3.7/email/generator.py", line 181, in _write
  self._dispatch(msg)
File ".../python3.7/email/generator.py", line 214, in _dispatch
  meth(msg)
File ".../python3.7/email/generator.py", line 432, in _handle_text
  super(BytesGenerator,self)._handle_text(msg)
File ".../python3.7/email/generator.py", line 249, in _handle_text
  self._write_lines(payload)
File ".../python3.7/email/generator.py", line 155, in _write_lines
  self.write(line)
File ".../python3.7/email/generator.py", line 406, in write
  self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 94: ordinal not in range(128)
[end of output]

An "ø" character in the README.rst file of the fabio project causes the UnicodeEncodeError error.

This works fine:

pip install "pyproject-metadata==0.8.0"
pip install fabio --no-build-isolation --no-cache

The text was updated successfully, but these errors were encountered:

Signed-off-by: Henry Schreiner <[email protected]>

woutdenolf · 2024-09-13T13:03:57Z

The difference between 0.8.0 and 0.9.0b1 is the message class which is RFC822Message in 0.9.0b1.

woutdenolf · 2024-09-13T13:07:06Z

Perhaps #142 ?

woutdenolf · 2024-09-13T13:12:50Z

Also happens for python 3.8.19, 3.9.19 and 3.10.14 so not caused by old email.message or something

pip install --pre fabio --no-cache --no-binary fabio

henryiii · 2024-09-13T13:12:53Z

@pradyunsg or @FFY00, could you yank 0.9.0b1? I realize I'm not a maintainer on the PyPI package.

We can use https://github.com/shacklettbp/madrona/blob/3fab172cb0ad41e0b035871885c2bc4c459c520d/py/madrona-py-build.py#L34C5-L35C38 instead of compat32, but then new lines aren't supported in headers...

dnicolodi · 2024-09-13T13:32:21Z

I thought pre-releases are meant for testing and that they are expected to contain bugs. If you want production quality releases, maybe do not use pre-releases? Otherwise you are kind of asking for known bugs...

henryiii · 2024-09-13T13:38:14Z

This wasn't a known bug. I would't make a prerelease with a known bug. But you are opting in to helping us test untested code. ;)

dnicolodi · 2024-09-13T13:44:52Z

I meant to say that it is known that very likely pre-releases contain bugs, not that the specific bugs are know

henryiii · 2024-09-13T13:50:05Z

It just sounded like you were discouraging reporting on bugs in beta releases, when this is exactly what I wanted from the prerelease, bug reports. :)

dnicolodi · 2024-09-13T13:52:46Z

I was objecting to requesting to yank the pre-release because it contains a bug. I think it works against the goal of having the pre-release in the first place. Of course the bug report is well accepted! Thank you for testing and reporting back @woutdenolf ! I'm sorry if I made it sound like your report was not appreciated. It was not my intention.

henryiii · 2024-09-13T13:54:42Z

Ah, yes. Didn't know it was about the yanking. I like yanking anything with a known bug of this size (no unicode); I agree it's not critically required, though. You can still install it after yanking, so backend authors could still play with it, though I plan to make a b2 with a fix soon.

woutdenolf · 2024-09-13T16:01:23Z

In the CI of all our projects we use --pre precisely to catch upstream bugs like this before they go in production. We always struggle however when this happens. Essentially we have four options:

remove --pre from the CI of all our affected projects until the upstream project (pyproject-metadata in this case) creates a new (pre-)release with a fix
live with a broken CI until the upstream project creates a new (pre-)release with a fix
modify the dependencies in all our broken projects to include pyproject-metadata!=0.9.0b1 even when the project does not directly depend on it and remove it when the upstream project creates a new (pre-)release with a fix
ask the buggy pre-release to be yanked

As you can see the last option would be easier for downstream projects. But I understand you want the buggy pre-release to stay so other projects can report other bugs until you deployed a fix, after which yanking or not does not matter anymore. So up to you. Thanks for the fast response!

eli-schwartz · 2024-09-13T18:15:30Z

There are two types of projects:

the type where a beta version changes tons of stuff across a huge usage surface, and 500 different projects are depending on and using it 50 different ways, of which 30 projects are broken because of two bugs
the type where there is mostly just one way to use the project and a couple new features were added, and most projects using it are going to hit the issue

In the former case, I would say that yanking a beta is very unproductive. It's perhaps unfortunate for the handful of projects with the issue, but a good balance here may be to have two CI jobs, one of which tests pre-releases, and then just... resign yourself to a red X-mark in your CI statuses for a short while?

In the latter case, yanking usually makes sense because the bug is kind of a showstopper and projects which cannot build due to one bug aren't going to find a different bug at the same time; yanking the release keeps things less noisy for pre-release CI and means that those projects are more likely to notice it's still broken once you release another beta with a fix.

The question is always, which one do you think your project is? :D

My gut feeling is that in this case the answer is "probably" number 2.

henryiii · 2024-09-13T18:45:01Z

Can you verify if 0.9.0 beta 2 fixes this?

henryiii · 2024-09-13T18:46:20Z

I think it depends entirely on the breakage: if it's something that was supposed to break then you should not yank a beta. That's a perfect chance for people to adapt to the change, and it's great they found it early. But if it was not supposed to break, then you should yank it, because you don't want people to adapt to the change, it wasn't intentional!

eli-schwartz · 2024-09-13T18:50:10Z

I should probably have also added "and depending on how easy it is to roll out a new beta". Depending on the project it could be something that is only practical to do once a week.

I don't think people will "adapt" to a runtime crash either way.

henryiii · 2024-09-13T18:51:27Z

(Also, yes, it can depend on how likely projects using it are affected by the change, too; this was any project using unicode, which I think was large enough to be problematic).

And yes, how easy it is to ship. A pure Python project is pretty easy. I'm not worrying about getting signoff from a majority of the maintainers for beta releases like I will for the full release.

You could adapt by removing the unicode, I suppose. :P

henryiii · 2024-09-13T18:58:14Z

Testing it locally:

uv venv
uv pip install -U --pre meson-python meson ninja cython pip
. .venv/bin/activate.fish
pip install fabio --no-build-isolation --no-cache --no-binary fabio

And... .as_bytes() is still breaking. str(...) would work, but bytes(...) calls s.encode('ascii', 'surrogateescape').

Traceback:

      Traceback (most recent call last):
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
          whl_basename = backend.build_wheel(metadata_directory, config_settings)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 1020, in wrapper
          return func(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 1074, in build_wheel
          return project.wheel(out).name
                 ^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 925, in wheel
          return builder.build(directory)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 463, in build
          self._wheel_write_metadata(whl)
        File "/Users/henryschreiner/git/software/pybind11/tmp/.venv/lib/python3.12/site-packages/mesonpy/__init__.py", line 451, in _wheel_write_metadata
          whl.writestr(f'{self._distinfo_dir}/METADATA', bytes(self._metadata.as_rfc822()))
                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/message.py", line 194, in __bytes__
          return self.as_bytes()
                 ^^^^^^^^^^^^^^^
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/message.py", line 208, in as_bytes
          g.flatten(self, unixfrom=unixfrom)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 117, in flatten
          self._write(msg)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 182, in _write
          self._dispatch(msg)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 219, in _dispatch
          meth(msg)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 446, in _handle_text
          super(BytesGenerator,self)._handle_text(msg)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 263, in _handle_text
          self._write_lines(payload)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 156, in _write_lines
          self.write(line)
        File "/usr/local/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/email/generator.py", line 420, in write
          self._fp.write(s.encode('ascii', 'surrogateescape'))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 94: ordinal not in range(128

henryiii · 2024-09-13T19:23:01Z

~~What I don't understand is why I can't reproduce this in our testsuite with a small test. Copied in pyproject.toml, copying, and README.rst and got the error to show up that way.~~ 🤦.

Ahh, it's this character that's causing the issue: 🕵️. Ahh, it's only the payload that's causing issues (now). If there's a unicode character in the payload, then str() works, but bytes() doesn't.

woutdenolf · 2024-09-13T19:25:06Z

Can you verify if 0.9.0 beta 2 fixes this?

As you figured out it doesn't. I appreciate the swift attempt at a fix though!

The question is always, which one do you think your project is? :D
My gut feeling is that in this case the answer is "probably" number 2

Not 500 but a few dozen projects depend on fabio I'm guessing. Not many of them have CI with --pre, after all we're talking about scientific data processing and we tend to live dangerously ;-).

Ahh, it's this character that's causing the issue: 🕵️

I thought it was this character but I might be mistaken: ø.

henryiii · 2024-09-13T19:26:16Z

In beta 1 is was ø. Now it's 🕵️, since I fixed unicode handling for bytes and str for header values, but only for str for payloads, and the readme is the payload. :)

dnicolodi · 2024-09-13T19:27:57Z

For what is worth, this https://github.com/pypa/setuptools/blob/56fc3111ba97470d81bf98b1be997dcf791d4a5a/setuptools/command/bdist_wheel.py#L596-L602 is what setuptools does.

dnicolodi · 2024-09-13T19:34:40Z

But elsewhere it does something completely different. But setuptools being the stratification of hacks it is, it is difficult to understand which code is actually executed...

henryiii · 2024-09-13T19:55:38Z

As far as I can tell, the email module doesn't support anything other than hardcoded ascii for its BinaryGenerator when it's dealing with payloads. I don't see how to get around https://github.com/python/cpython/blob/84e3f6078b5bab7d2d6413de6b089013d4aa4aa0/Lib/email/generator.py#L419-L420. The docs for utf8=True say it is for header contents.

It gets used here: https://github.com/python/cpython/blob/84e3f6078b5bab7d2d6413de6b089013d4aa4aa0/Lib/email/message.py#L196-L209

Otherwise, this almost works perfectly with EmailMessage:

class MetadataPolicy(email.policy.EmailPolicy):
    utf8 = True
    mangle_from_ = False
    max_line_length = 0

    def header_store_parse(self, name: str, value: str) -> tuple[str, str]:
        size = len(name) + 2
        value = value.replace('\n', '\n' + ' ' * size)
        return (name, value)

henryiii · 2024-09-13T21:10:46Z

Seems to be fine now with 0.9.0b3? 🤞

henryiii · 2024-09-13T21:18:03Z

Please close if you can verify. But my little command sequence above passes, at least. :)

woutdenolf · 2024-09-13T21:18:15Z

I can confirm 0.9.0b3 solved this issue. You made my day @henryiii ! Nothing better than seeing a wall of red turn green before the weekend. Much appreciated!

henryiii · 2024-09-13T21:21:19Z

Thanks for testing pre-releases!

pradyunsg · 2024-09-15T08:48:04Z

could you yank 0.9.0b1?

Done.

I've also invited you to the project @henryiii.

woutdenolf mentioned this issue Sep 13, 2024

pip install --pre fabio fails for 2024.09 silx-kit/fabio#588

Closed

henryiii referenced this issue Sep 13, 2024

refactor: use email.message.EmailMessage (#142)

5966114

Signed-off-by: Henry Schreiner <[email protected]>

dnicolodi mentioned this issue Sep 13, 2024

Regressions against pyproject-metadata==0.9.0b1 (prerelease) mesonbuild/meson-python#667

Closed

henryiii mentioned this issue Sep 13, 2024

fix: add unicode and join multilines #150

Merged

henryiii mentioned this issue Sep 13, 2024

chore: prepare for 0.9.0b2 #151

Merged

henryiii mentioned this issue Sep 13, 2024

fix: handle bytes() if payload has unicode #152

Merged

woutdenolf closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode character in README causes UnicodeEncodeError for 0.9.0b1 #149

Unicode character in README causes UnicodeEncodeError for 0.9.0b1 #149

woutdenolf commented Sep 13, 2024 •

edited

Loading

woutdenolf commented Sep 13, 2024

woutdenolf commented Sep 13, 2024

woutdenolf commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

dnicolodi commented Sep 13, 2024

henryiii commented Sep 13, 2024

dnicolodi commented Sep 13, 2024

henryiii commented Sep 13, 2024

dnicolodi commented Sep 13, 2024

henryiii commented Sep 13, 2024 •

edited

Loading

woutdenolf commented Sep 13, 2024

eli-schwartz commented Sep 13, 2024

henryiii commented Sep 13, 2024

henryiii commented Sep 13, 2024

eli-schwartz commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

woutdenolf commented Sep 13, 2024

henryiii commented Sep 13, 2024

dnicolodi commented Sep 13, 2024

dnicolodi commented Sep 13, 2024

henryiii commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024

henryiii commented Sep 13, 2024

woutdenolf commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024

pradyunsg commented Sep 15, 2024

Unicode character in README causes UnicodeEncodeError for 0.9.0b1 #149

Unicode character in README causes UnicodeEncodeError for 0.9.0b1 #149

Comments

woutdenolf commented Sep 13, 2024 • edited Loading

woutdenolf commented Sep 13, 2024

woutdenolf commented Sep 13, 2024

woutdenolf commented Sep 13, 2024 • edited Loading

henryiii commented Sep 13, 2024 • edited Loading

dnicolodi commented Sep 13, 2024

henryiii commented Sep 13, 2024

dnicolodi commented Sep 13, 2024

henryiii commented Sep 13, 2024

dnicolodi commented Sep 13, 2024

henryiii commented Sep 13, 2024 • edited Loading

woutdenolf commented Sep 13, 2024

eli-schwartz commented Sep 13, 2024

henryiii commented Sep 13, 2024

henryiii commented Sep 13, 2024

eli-schwartz commented Sep 13, 2024 • edited Loading

henryiii commented Sep 13, 2024 • edited Loading

henryiii commented Sep 13, 2024 • edited Loading

henryiii commented Sep 13, 2024 • edited Loading

woutdenolf commented Sep 13, 2024

henryiii commented Sep 13, 2024

dnicolodi commented Sep 13, 2024

dnicolodi commented Sep 13, 2024

henryiii commented Sep 13, 2024 • edited Loading

henryiii commented Sep 13, 2024

henryiii commented Sep 13, 2024

woutdenolf commented Sep 13, 2024 • edited Loading

henryiii commented Sep 13, 2024

pradyunsg commented Sep 15, 2024

woutdenolf commented Sep 13, 2024 •

edited

Loading

woutdenolf commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

eli-schwartz commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

henryiii commented Sep 13, 2024 •

edited

Loading

woutdenolf commented Sep 13, 2024 •

edited

Loading