Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace astor with ast.unparse() and Black #205

Closed
wants to merge 4 commits into from

Conversation

akaihola
Copy link

@akaihola akaihola commented Feb 9, 2025

Astor isn't yet compatible with Python 3.14 and the changes in the ast standard library module (see "Deprecated since" notes in the docs). A fix (see berkerpeksag/astor#217) is planned for Astor 0.9 (see berkerpeksag/astor#229). On the other hand, ast.unparse() has been in the standard library since Python 3.9, and Flynt has already dropped support for Python 3.8. Flynt could simply migrate from Astor to ast.unparse() and get Python 3.14 support through that.

This PR

  • changes ast_to_string() to use ast.unparse() instead of astor.to_source()
  • post-processes the result using Black to get double quotes instead of single quotes
  • adds Black as a dependency

To do before merge:

  • investigate the insert_constant_str.py test failure
  • fix the insert_constant_str.py test failure
  • investigate and fix the parentheses test failure
  • investigate and fix the test_chain_fmt_3 test failure

Next steps:

  • fix Python AST deprecations #195 and support for Python 3.14 by replacing references to ast.Str and ast.Num with ast.Constant plus any additional required logic

See also:

@akaihola akaihola marked this pull request as draft February 9, 2025 20:15
@akaihola
Copy link
Author

akaihola commented Feb 9, 2025

The following test failures appeared after these changes (on Python 3.13; click on titles to show details):

`ast.unparse()` doesn't remove parentheses
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=0, call_transforms=0, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    def test_ifexpr(state: State):
        s_in = """'%s' % (a if c else b)"""
        s_expected = """f'{a if c else b}'"""
    
        s_out, count = code_editor.fstringify_code_by_line(s_in, state)
>       assert s_out == s_expected
E       assert "f'{(a if c else b)}'" == "f'{a if c else b}'"
E         - f'{a if c else b}'
E         + f'{(a if c else b)}'
E         ?    +             +
After `ast.unparse()` and Black, count from `fstringify_code_by_line` goes from zero to one
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=4, call_transforms=3, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    def test_chain_fmt_3(state: State):
        s_in = """a = "Hello {}".format(d["a{}".format( d["a{}".format(key) ]) ] )"""
        s_out, count = code_editor.fstringify_code_by_line(s_in, state)
    
>       assert count == 0
E       assert 1 == 0
Leading whitespace before a closing triple quote is eaten by Black 25.1.0 [UGLY WORKAROUND BELOW]
My best guess is that Black interprets the triple quoted string in this test sample as a docstring and strips trailing whitespace:
def f():
    return (
        """
        some {}
        lorem ipsum
        """.format("text")
    )

Test run:

filename = 'insert_constant_str.py'
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=2, call_transforms=1, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.parametrize("filename", samples)
    def test_fstringify(filename, state):
        # this skips "string_in_string.py" for python >=3.12.
        # the behavior on these python versions differs. In fact, 3.12 behavior is preferable.
        # When only supported versions are 3.12 and up, expected output should be modified.
        if filename == "string_in_string.py" and sys.version_info > (3, 11):
            return
    
        out, expected = try_on_file(
            filename,
            partial(fstringify_code_by_line, state=state),
        )
>       assert out == expected
E       assert 'def f():\n    return (\n        """\n        some text\n        lorem ipsum\n"""\n    )\n' == 'def f():\n    return (\n        """\n        some text\n        lorem ipsum\n        """\n    )\n'
E           def f():
E               return (
E                   """
E                   some text
E                   lorem ipsum
E         -         """
E         + """
E               )
No escaping of a closing double quote inside triple quotes [TEST FIXED]
This was probably a quirk in Astor and the new behavior is correct. Test should be corrected:
filename = 'multiline_keep.py'
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=2, call_transforms=1, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.parametrize("filename", samples)
    def test_fstringify(filename, state):
        # this skips "string_in_string.py" for python >=3.12.
        # the behavior on these python versions differs. In fact, 3.12 behavior is preferable.
        # When only supported versions are 3.12 and up, expected output should be modified.
        if filename == "string_in_string.py" and sys.version_info > (3, 11):
            return
    
        out, expected = try_on_file(
            filename,
            partial(fstringify_code_by_line, state=state),
        )
>       assert out == expected
E       assert 'length, quotes = 15, 20\ntext = f"""\n# This is an example configuration\n# Hash lines are comments\n\n[flake8]\nmax-line-length={length}\ninline-quotes="{quotes}"\n"""' == 'length, quotes = 15, 20\ntext = f"""\n# This is an example configuration\n# Hash lines are comments\n\n[flake8]\nmax-line-length={length}\ninline-quotes="{quotes}\\"\n"""'
E           length, quotes = 15, 20
E           text = f"""
E           # This is an example configuration
E           # Hash lines are comments
E           
E           [flake8]
E           max-line-length={length}
E         - inline-quotes="{quotes}\"
E         ?                        -
E         + inline-quotes="{quotes}"
E           """
Single quotes turned into double quotes [TEST CASES FIXED]
New behavior is probably better here as well, and tests could be corrected:
pycode_with_2_concats = '"""\nThis module will be transformed... into something far greater.\n"""\n\na = "Hello"\n\nmsg = a + " World"\n\nmsg2 = "Finally, " + a + " World"\n\n\nprint(msg)\n'

    @pytest.mark.skipif(sys.version_info < (3, 8), reason="requires python3.8 or higher")
    def test_find_victims_primitives(pycode_with_2_concats: str):
        tree = ast.parse(pycode_with_2_concats)
    
        ch = ConcatHound()
        ch.visit(tree)
    
        assert len(ch.victims) == 2
    
        v1, v2 = ch.victims
>       assert str(v1) == "a + ' World'"
E       assert 'a + " World"' == "a + ' World'"
E         - a + ' World'
E         ?     ^      ^
E         + a + " World"
E         ?     ^      ^
pycode_with_2_concats = '"""\nThis module will be transformed... into something far greater.\n"""\n\na = "Hello"\n\nmsg = a + " World"\n\nmsg2 = "Finally, " + a + " World"\n\n\nprint(msg)\n'
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=0, call_transforms=0, invalid_conversions=0, concat_candidates=2, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.skipif(sys.version_info < (3, 8), reason="requires python3.8 or higher")
    def test_find_victims_api(pycode_with_2_concats: str, state: State):
        gen = concat_candidates(pycode_with_2_concats, state)
        lst = list(gen)
    
        assert len(lst) == 2
    
        v1, v2 = lst
>       assert str(v1) == "a + ' World'"
E       assert 'a + " World"' == "a + ' World'"
E         - a + ' World'
E         ?     ^      ^
E         + a + " World"
E         ?     ^      ^
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=0, call_transforms=0, invalid_conversions=0, concat_candidates=1, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.skipif(sys.version_info < (3, 8), reason="requires python3.8 or higher")
    def test_find_victims_parens(state: State):
        txt_in = """print('blah' + (thing - 1))"""
        gen = concat_candidates(txt_in, state)
        lst = list(gen)
    
        assert len(lst) == 1
    
        v1 = lst[0]
>       assert str(v1) == """'blah' + (thing - 1)"""
E       assert '"blah" + (thing - 1)' == "'blah' + (thing - 1)"
E         - 'blah' + (thing - 1)
E         ? ^    ^
E         + "blah" + (thing - 1)
E         ? ^    ^
    def test_concats_fstring():
    
        txt = """print(f'blah{thing}' + 'blah' + otherThing + f"is {x:d}")"""
        expected = """print(f'blah{thing}blah{otherThing}is {x:d}')"""
    
        new, changed = transform_concat_from_str(txt)
    
        assert changed
>       assert new == expected
E       assert 'print(f"blah{thing}blah{otherThing}is {x:d}")' == "print(f'blah{thing}blah{otherThing}is {x:d}')"
E         - print(f'blah{thing}blah{otherThing}is {x:d}')
E         ?        ^                                   ^
E         + print(f"blah{thing}blah{otherThing}is {x:d}")
E         ?        ^                                   ^
    def test_string_in_string_x3():
    
        txt = """'blah' + blah.blah('more' + vars.foo('other' + b))"""
    
        new, changed = transform_concat_from_str(txt)
    
        assert changed
>       assert "'blah' +" in new
E       assert "'blah' +" in '"blah" + blah.blah(f"more{vars.foo(f\'other{b}\')}")'

test/test_str_concat/test_transformer.py:108: AssertionError
    def test_embedded_fstr():
    
        txt = """print(f"{f'blah{var}' + abc}blah")"""
        expected = """print(f'blah{var}{abc}blah')"""
    
        new, changed = transform_concat_from_str(txt)
    
        assert changed
>       assert new == expected
E       assert 'print(f"blah{var}{abc}blah")' == "print(f'blah{var}{abc}blah')"
E         - print(f'blah{var}{abc}blah')
E         ?        ^                  ^
E         + print(f"blah{var}{abc}blah")
E         ?        ^                  ^

@akaihola
Copy link
Author

akaihola commented Feb 9, 2025

@ikamensh can you figure out why count suddenly is 1 instead of 0 in this test?

state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=4, call_transforms=3, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    def test_chain_fmt_3(state: State):
        s_in = """a = "Hello {}".format(d["a{}".format( d["a{}".format(key) ]) ] )"""
        s_out, count = code_editor.fstringify_code_by_line(s_in, state)
    
>       assert count == 0
E       assert 1 == 0

s_out looks correct when I add an assertion for it in the test:

a = f"Hello {d[f'a{d[f'a{key}']}']}"

@akaihola
Copy link
Author

This test failure occurs with Black 25.1.0 but not with earlier versions:

filename = 'insert_constant_str.py'
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=2, call_transforms=1, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.parametrize("filename", samples)
    def test_fstringify(filename, state):
        # this skips "string_in_string.py" for python >=3.12.
        # the behavior on these python versions differs. In fact, 3.12 behavior is preferable.
        # When only supported versions are 3.12 and up, expected output should be modified.
        if filename == "string_in_string.py" and sys.version_info > (3, 11):
            return
    
        out, expected = try_on_file(
            filename,
            partial(fstringify_code_by_line, state=state),
        )
>       assert out == expected
E       assert 'def f():\n    return (\n        """\n        some text\n        lorem ipsum\n"""\n    )\n' == 'def f():\n    return (\n        """\n        some text\n        lorem ipsum\n        """\n    )\n'
E         
E           def f():
E               return (
E                   """
E                   some text
E                   lorem ipsum
E         -         """
E         + """
E               )

test/integration/test_files.py:25: AssertionError

The breaking PR was psf/black#4558. The commits are still available in the GitHub UI, but unfortunately unfetchable by Git since the original black25 branch from @JelleZijlstra is gone. Probably the breaking change is in commit 8117db06. Based on the change log entry, I'd say the breaking change corresponds to either of these issues:

@akaihola
Copy link
Author

The issue with Black 25.1.0 seems to be that it interprets a plain string literal to be a docstring.

In actual Python source files, of course, a plain string literal (as opposed to e.g. a call argument, part of an expression, or an annotation) really is a docstring, so Black is kind of doing the right thing here.

So the problem is caused by the fact that Flynt's ast_to_string() indeed receives plain string literals. Feeding them into black.format_str() causes trailing whitespace to be eaten.

To reproduce the difference between behavior of Black versions 24.10.0 and 25.1.0 caused by psf/black#4558:

Plain string literal (as fed to black.format_str() by Flynt):

24.10.0

$ echo '"trailing spaces:    "' | uvx --with black==24.10.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.

25.1.0

$ echo '"trailing spaces:    "' | uvx --with black==25.1.0 black -q --diff --stdin-filename=test.py -
--- STDIN       2025-02-10 17:52:20.735183+00:00
+++ STDOUT	2025-02-10 17:52:20.737355+00:00
@@ -1 +1 @@
-"trailing spaces:    "
+"trailing spaces:"

Same string literal, always formatted correctly when in context:

$ echo 'print("trailing spaces:    ")' | uvx --with black==24.10.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.
$ echo 'print("trailing spaces:    ")' | uvx --with black==25.1.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.
$ echo 's = "trailing spaces:    "' | uvx --with black==24.10.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.
$ echo 's = "trailing spaces:    "' | uvx --with black==25.1.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.

@akaihola
Copy link
Author

Here's an ugly work-around to the Black 25.1.0 problem. It doesn't cause test failures elsewhere.

def ast_to_string(node: ast.AST) -> str:
    # ast.unparse() favors single quotes, use Black to turn them into double quotes
    code = ast.unparse(node)
    # to avoid plain literal strings to be stripped of trailing whitespace,
    # we add a dummy assignment
    is_plain_constant = isinstance(node, ast.Constant)
    if is_plain_constant:
        code = f"_ = {code}"
    formatted = format_str(code, mode=Mode()).rstrip()
    # remove the dummy assignment if present
    return formatted[4:] if is_plain_constant else formatted

@akaihola
Copy link
Author

akaihola commented Feb 10, 2025

As for the parentheses test failure, I think

f'{(a if c else b)}'

is just what ast.unparse() does, and there's no easy way around it. Could we @ikamensh just accept this change in behavior?

Here's the difference to Astor:

>>> import ast
>>> import astor
>>> node = ast.JoinedStr(
...     values=[ast.FormattedValue(value=ast.IfExp(test=ast.Name(id='c'), body=ast.Name(id='a'), orelse=ast.Name(id='b')), conversion=-1)]
... )
>>> print(ast.unparse(node))
f'{(a if c else b)}'
>>> print(astor.to_source(node))
f"""{a if c else b}"""

@ikamensh
Copy link
Owner

count from fstringify_code_by_line goes from zero to one

Count of changes means now this code is changed, lets compare out vs code_in and see what and how is changed

@ikamensh
Copy link
Owner

is just what ast.unparse() does, and there's no easy way around it. Could we @ikamensh just accept this change in behavior?

I think we can have any hacks around black internally, but I don't want flynt to produce code one would want to edit (manually?) further. extra parenthesis and single quotes are in this group. Looking at astor PR, I think it has good chances to be in time for the py3.14 release, I don't see a point to change behavior now to something suboptimal if no action will be needed on normal release schedule. This is ofcourse specific to replacing astor with ast + black part of #195 - fixing ast nodes remains fully motivated.

I'm further wondering about using black now. Sorry for not thinking it through before this PR came around. If its behavior is untrivially dependent on the version, then I see this problem: flynt will produce different output depending on black version. If we fix black version, it might conflict with what user might be already using. I think for 95% of the cases its no big deal, but it will be an annoying edge case going forward. I like astor because typical user will not have astor in their python installation. Black on the other hand is very popular, if maybe being slowly replaced by ruff.

all in all, for this to be worth it it should be possible to:

  • not reduce output code simplicity, e.g. no extra parens
  • have no known, significant behavior differences depending on black version
  • not to reduce standard compliance of output code (single quote means black/ruff must run after flynt to have standard code)

@akaihola
Copy link
Author

I agree with everything above, especially since the parentheses problem is really hard to solve without modifying ast.unparse(). I'm closing this PR and looking at helping with berkerpeksag/astor#217 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python AST deprecations
2 participants