Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected nested exception with b64decode #129505

Open
hannob opened this issue Jan 31, 2025 · 5 comments
Open

Unexpected nested exception with b64decode #129505

hannob opened this issue Jan 31, 2025 · 5 comments
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir

Comments

@hannob
Copy link

hannob commented Jan 31, 2025

Bug report

Bug description:

The b64decode function of the base64 standard library module can behave in unexpected ways with invalid inputs.

Take this example:

#!/usr/bin/python3
import base64
base64.b64decode('a\xd6==')

According to the docs, I would expect that invalid characters in the input cause a binascii.Error exception. However, this example gives me the following:

Traceback (most recent call last):
  File "/usr/lib/python3.12/base64.py", line 37, in _bytes_from_decode_data
    return s.encode('ascii')
           ^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode character '\xd6' in position 1: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/./test", line 3, in <module>
    base64.b64decode('a\xd6==')
  File "/usr/lib/python3.12/base64.py", line 83, in b64decode
    s = _bytes_from_decode_data(s)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/base64.py", line 39, in _bytes_from_decode_data
    raise ValueError('string argument should contain only ASCII characters')
ValueError: string argument should contain only ASCII characters

So not only do I get an exception that I don't expect according to the docs, it appears there is a bug in the internal exception handling causing an exception (ValueError) during another exception (UnicodeEncodeError).

CPython versions tested on:

3.13

Operating systems tested on:

Linux

@hannob hannob added the type-bug An unexpected behavior, bug, or error label Jan 31, 2025
@encukou
Copy link
Member

encukou commented Jan 31, 2025

This is the intended error; the error message tells you exactly what's going on. (And the earlier UnicodeEncodeError says where in the string the problem is.

The docs already say the function needs an “encoded bytes-like object or ASCII string”. Perhaps the validate section should be clarified, do you want to send a PR?

@encukou encukou added docs Documentation in the Doc dir stdlib Python modules in the Lib dir and removed type-bug An unexpected behavior, bug, or error labels Jan 31, 2025
@hannob
Copy link
Author

hannob commented Feb 1, 2025

I don't think the validate section is the issue (note this happens with validate=False).

If this is intended behavior, I'd add something to the last sentence, like this:

"May assert or raise a ValueError if the length of altchars is not 2 or the input is a string and contains non-ascii characters."

Shall I send a PR for that?

@hannob
Copy link
Author

hannob commented Feb 1, 2025

Or maybe better, as we shouldn't expect an assert for invalid input chars (at least I hope so):

"May assert or raise a ValueError if the length of altchars is not 2. May also raise a ValueError if the input is a string with non-ascii characters."

@vadmium
Copy link
Member

vadmium commented Feb 1, 2025

This report is similar to Issue #105193.

I don’t think b64decode('\xD6') is a valid call, even though b64decode(b'\xD6') would be valid (non-alphabet character guaranteed to be discarded in that case).

I don’t see much value in the documentation suggesting a possible behaviour with a non-ASCII text argument, which I interpret as a programming error. Is the documentation perhaps not clear enough about what arguments provide supported behaviour (binascii.Error exception or successful decoding) and what is unsupported (wrong data type, wrong text characters, etc)?

@encukou
Copy link
Member

encukou commented Feb 3, 2025

IMO, documenting the ValueError would be good: wrong data type is programming error than not validating string input.
An economic way to word it would be “the input string contains non-ascii characters.”
I'd also add “ASCII” to the validate argument doc: “these non-alphabet ASCII characters”

Do you want to send a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir
Projects
Status: Todo
Development

No branches or pull requests

3 participants