Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed assertion in _PyUnicode_Equal from calculate_suggestions with non-string candidate #129573

Open
devdanzin opened this issue Feb 2, 2025 · 9 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@devdanzin
Copy link
Contributor

devdanzin commented Feb 2, 2025

Crash report

What happened?

The interpreter will abort if runpy._run_module_code is called with invalid values:

import runpy
runpy._run_module_code("A", {0: ""}, "")

Another way to trigger:

class Parent:
    def __dir__(self):
        return [0]

class WithNonStringAttrs(Parent):
    blech = None

WithNonStringAttrs().bluch

Abort message:

python: Objects/unicodeobject.c:10799: _PyUnicode_Equal: Assertion `PyUnicode_Check(str2)' failed.
Aborted (core dumped)

The suggestions machinery is the source of the issue, with calculate_suggestions calling _PyUnicode_Equal(name, item) for an item that might not be unicode.

Only aborts in 3.12. In 3.13 and main it results in an error in the traceback machinery:

Exception ignored in the internal traceback machinery:
Traceback (most recent call last):
  File "/home/danzin/projects/upstream_cpython/Lib/traceback.py", line 139, in _print_exception_bltin
    return print_exception(exc, limit=BUILTIN_EXCEPTION_LIMIT, file=file, colorize=colorize)
  File "/home/danzin/projects/upstream_cpython/Lib/traceback.py", line 129, in print_exception
    te = TracebackException(type(value), value, tb, limit=limit, compact=True)
  File "/home/danzin/projects/upstream_cpython/Lib/traceback.py", line 1090, in __init__
    suggestion = _compute_suggestion_error(exc_value, exc_traceback, wrong_name)
  File "/home/danzin/projects/upstream_cpython/Lib/traceback.py", line 1531, in _compute_suggestion_error
    return _suggestions._generate_suggestions(d, wrong_name)
TypeError: all elements in 'candidates' must be strings
[...]

Will submit a PR shortly.

Found using fusil by @vstinner.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.12.8+ (heads/3.12:580d7810946, Feb 2 2025, 01:56:47) [GCC 13.3.0]

Linked PRs

@devdanzin devdanzin added the type-crash A hard crash of the interpreter, possibly with a core dump label Feb 2, 2025
@picnixz picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.12 bugs and security fixes labels Feb 2, 2025
@ZeroIntensity
Copy link
Member

@picnixz, could you take a second look at this? I think this issue is more serious than it looks, and I want a core dev opinion before applying security labels.

  • _PyUnicode_Equal should be crashing here, but instead, it's doing something much worse: it's accessing arbitrary memory, via unicode_compare_eq, which is secretely passing in the reproducer. This means we're prone to buffer overflow attacks!
  • This code affects 3.11+, including main, but we sidestep the issue on 3.13+ by incidentally having a few checks beforehand that prevent a non-string from making it to that piece of code. @devdanzin, I'll close your PR, then please open a new one for main so we can backport accordingly.

I'm not certain on how accessible the attack is, but it's not making me particularly comfortable. There's some other ways to trigger this, for example:

globals()[1] = "a"
b

My main concern would be things that do some sort of runtime introspection or construction, but I'm definitely doing a good deal of speculation. Imagine a library that wants to expose JSON responses as objects, they might design an object like this:

class DynamicJSON:
    def __init__(self, response):
        self.response = response
        for k, v in response.items():
            setattr(self, str(k), v)

    def __dir__(self):
        return list(self.response.keys())

Someone could in theory hijack the API response, and then wait for a developer downstream to mistype and get a NameError. (I do understand this is far-fetched, but my point is that we probably shouldn't leave invisible buffer overflows on 3.11.)

TL;DR Do you think this is worth backporting to 3.11?

@ZeroIntensity ZeroIntensity added 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Feb 2, 2025
@vstinner
Copy link
Member

vstinner commented Feb 2, 2025

I want a core dev opinion before applying security labels.

I don't think that it deserves to be handled a security vulnerability: an attacker requires to be able to execute arbitrary Python code, so it's already game over.

@ZeroIntensity
Copy link
Member

Right--but my point is that this sort of thing could reasonably slip past reviews for supply chain attacks.

If you were to look at the reproducer (especially the DynamicJSON example I gave), would that scream "security vulnerability" to you? (It certainly doesn't to me!) I'm worried that we're giving attackers a way to slip invisible buffer overflows into codebases, but I'll definitely yield to you.

@vstinner
Copy link
Member

vstinner commented Feb 2, 2025

Right--but my point is that this sort of thing could reasonably slip past reviews for supply chain attacks.

IMO there are simpler ways to inject malwares.

@ZeroIntensity
Copy link
Member

Ok, I'll leave it up to you. Is there any downside to backporting the fix?

@picnixz
Copy link
Member

picnixz commented Feb 2, 2025

There's some other ways to trigger this, for example:

I think this is something we already had in another issue (I think @ncoghlan was involved in this issue but I don't exactly remember which one it was). I think we decided that messing up with globals like this is at your own risk though and didn't treat it as a security issue.

it's accessing arbitrary memory, via unicode_compare_eq

As I'm on mobile it's quite hard to check the relevant code part but IIUC, you mean we are accessing other part of the memory because we are not crashing in a release build (there is no issue with DEBUG builds right? just that release builds may access arbitrary memory because we didn't abort in time).

If this is the case, it's indeed a possible security issue but we need to take into account how likely it can be exploited. As I don't have time (nor the resources) to audit this, I'll cc some other security experts: @gpshead @sethmlarson (@alex if you can have a look at it, it would be appreciated, thanks in advance).


Now, even though it could be a hard-to-exploit vulnerability, I still think we can afford a backport up to 3.9 though I'll wait for "practical" experts' opinion (this kind of security issues is not my area of expertise though I have some skills as part of my curriculum).

@ZeroIntensity
Copy link
Member

you mean we are accessing other part of the memory because we are not crashing in a release build (there is no issue with DEBUG builds right? just that release builds may access arbitrary memory because we didn't abort in time).

Yeah, PyUnicode_LENGTH and PyUnicode_DATA are called on something that can be any arbitrary Python object.

Now, even though it could be a hard-to-exploit vulnerability, I still think we can afford a backport up to 3.9 though I'll wait for "practical" experts' opinion

Note that this doesn't affect <3.11, the relevant code there uses functions that do check if the argument is a unicode object.

I'm inclined to follow Victor's opinion here--if he thinks the chance of a vulnerability is too small, then it's probably fine to keep the fix on 3.12+

@alex
Copy link
Member

alex commented Feb 2, 2025

In Rust parlance we'd say that this code is "unsound" -- there's not a vulnerability to be directly exploited, but depending on the code written by the user, it may be possible to produce a vulnerability.

The question here is whether there is non-pathological Python code that might make this exploitable by a user? I'm not sure if there is.

@picnixz
Copy link
Member

picnixz commented Feb 2, 2025

I can't think of a non-pathological path that could be exploited. Maybe we could make the interpreter segfault though but again, I don't have my laptop to perform some experiments (in this case I'm just interested in making a segfault / memory access violation, which would make it more of a DoS attack).

For now, let's treat it as a regular bug and fix it on 3.12+. If I have time next week, I'll see if we can make it more catastrophic but I don't see an obvious way to do it (and if Alex doesn't see it either then I doubt I'd be able to find one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

5 participants