Replace astor with ast.unparse() and Black #205

akaihola · 2025-02-09T20:15:03Z

Astor isn't yet compatible with Python 3.14 and the changes in the ast standard library module (see "Deprecated since" notes in the docs). A fix (see berkerpeksag/astor#217) is planned for Astor 0.9 (see berkerpeksag/astor#229). On the other hand, ast.unparse() has been in the standard library since Python 3.9, and Flynt has already dropped support for Python 3.8. Flynt could simply migrate from Astor to ast.unparse() and get Python 3.14 support through that.

This PR

changes ast_to_string() to use ast.unparse() instead of astor.to_source()
post-processes the result using Black to get double quotes instead of single quotes
adds Black as a dependency

To do before merge:

investigate the insert_constant_str.py test failure
fix the insert_constant_str.py test failure
investigate and fix the parentheses test failure
investigate and fix the test_chain_fmt_3 test failure

Next steps:

fix Python AST deprecations #195 and support for Python 3.14 by replacing references to ast.Str and ast.Num with ast.Constant plus any additional required logic

See also:

Python AST deprecations #195

akaihola · 2025-02-09T20:28:07Z

The following test failures appeared after these changes (on Python 3.13; click on titles to show details):

`ast.unparse()` doesn't remove parentheses

state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=0, call_transforms=0, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    def test_ifexpr(state: State):
        s_in = """'%s' % (a if c else b)"""
        s_expected = """f'{a if c else b}'"""
    
        s_out, count = code_editor.fstringify_code_by_line(s_in, state)
>       assert s_out == s_expected
E       assert "f'{(a if c else b)}'" == "f'{a if c else b}'"
E         - f'{a if c else b}'
E         + f'{(a if c else b)}'
E         ?    +             +

After `ast.unparse()` and Black, count from `fstringify_code_by_line` goes from zero to one

state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=4, call_transforms=3, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    def test_chain_fmt_3(state: State):
        s_in = """a = "Hello {}".format(d["a{}".format( d["a{}".format(key) ]) ] )"""
        s_out, count = code_editor.fstringify_code_by_line(s_in, state)
    
>       assert count == 0
E       assert 1 == 0

Leading whitespace before a closing triple quote is eaten by Black 25.1.0 [UGLY WORKAROUND BELOW]

My best guess is that Black interprets the triple quoted string in this test sample as a docstring and strips trailing whitespace:

def f():
    return (
        """
        some {}
        lorem ipsum
        """.format("text")
    )

Test run:

filename = 'insert_constant_str.py'
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=2, call_transforms=1, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.parametrize("filename", samples)
    def test_fstringify(filename, state):
        # this skips "string_in_string.py" for python >=3.12.
        # the behavior on these python versions differs. In fact, 3.12 behavior is preferable.
        # When only supported versions are 3.12 and up, expected output should be modified.
        if filename == "string_in_string.py" and sys.version_info > (3, 11):
            return
    
        out, expected = try_on_file(
            filename,
            partial(fstringify_code_by_line, state=state),
        )
>       assert out == expected
E       assert 'def f():\n    return (\n        """\n        some text\n        lorem ipsum\n"""\n    )\n' == 'def f():\n    return (\n        """\n        some text\n        lorem ipsum\n        """\n    )\n'
E           def f():
E               return (
E                   """
E                   some text
E                   lorem ipsum
E         -         """
E         + """
E               )

No escaping of a closing double quote inside triple quotes [TEST FIXED]

This was probably a quirk in Astor and the new behavior is correct. Test should be corrected:

filename = 'multiline_keep.py'
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=2, call_transforms=1, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.parametrize("filename", samples)
    def test_fstringify(filename, state):
        # this skips "string_in_string.py" for python >=3.12.
        # the behavior on these python versions differs. In fact, 3.12 behavior is preferable.
        # When only supported versions are 3.12 and up, expected output should be modified.
        if filename == "string_in_string.py" and sys.version_info > (3, 11):
            return
    
        out, expected = try_on_file(
            filename,
            partial(fstringify_code_by_line, state=state),
        )
>       assert out == expected
E       assert 'length, quotes = 15, 20\ntext = f"""\n# This is an example configuration\n# Hash lines are comments\n\n[flake8]\nmax-line-length={length}\ninline-quotes="{quotes}"\n"""' == 'length, quotes = 15, 20\ntext = f"""\n# This is an example configuration\n# Hash lines are comments\n\n[flake8]\nmax-line-length={length}\ninline-quotes="{quotes}\\"\n"""'
E           length, quotes = 15, 20
E           text = f"""
E           # This is an example configuration
E           # Hash lines are comments
E           
E           [flake8]
E           max-line-length={length}
E         - inline-quotes="{quotes}\"
E         ?                        -
E         + inline-quotes="{quotes}"
E           """

Single quotes turned into double quotes [TEST CASES FIXED]

New behavior is probably better here as well, and tests could be corrected:

pycode_with_2_concats = '"""\nThis module will be transformed... into something far greater.\n"""\n\na = "Hello"\n\nmsg = a + " World"\n\nmsg2 = "Finally, " + a + " World"\n\n\nprint(msg)\n'

    @pytest.mark.skipif(sys.version_info < (3, 8), reason="requires python3.8 or higher")
    def test_find_victims_primitives(pycode_with_2_concats: str):
        tree = ast.parse(pycode_with_2_concats)
    
        ch = ConcatHound()
        ch.visit(tree)
    
        assert len(ch.victims) == 2
    
        v1, v2 = ch.victims
>       assert str(v1) == "a + ' World'"
E       assert 'a + " World"' == "a + ' World'"
E         - a + ' World'
E         ?     ^      ^
E         + a + " World"
E         ?     ^      ^

pycode_with_2_concats = '"""\nThis module will be transformed... into something far greater.\n"""\n\na = "Hello"\n\nmsg = a + " World"\n\nmsg2 = "Finally, " + a + " World"\n\n\nprint(msg)\n'
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=0, call_transforms=0, invalid_conversions=0, concat_candidates=2, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.skipif(sys.version_info < (3, 8), reason="requires python3.8 or higher")
    def test_find_victims_api(pycode_with_2_concats: str, state: State):
        gen = concat_candidates(pycode_with_2_concats, state)
        lst = list(gen)
    
        assert len(lst) == 2
    
        v1, v2 = lst
>       assert str(v1) == "a + ' World'"
E       assert 'a + " World"' == "a + ' World'"
E         - a + ' World'
E         ?     ^      ^
E         + a + " World"
E         ?     ^      ^

state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=0, call_transforms=0, invalid_conversions=0, concat_candidates=1, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.skipif(sys.version_info < (3, 8), reason="requires python3.8 or higher")
    def test_find_victims_parens(state: State):
        txt_in = """print('blah' + (thing - 1))"""
        gen = concat_candidates(txt_in, state)
        lst = list(gen)
    
        assert len(lst) == 1
    
        v1 = lst[0]
>       assert str(v1) == """'blah' + (thing - 1)"""
E       assert '"blah" + (thing - 1)' == "'blah' + (thing - 1)"
E         - 'blah' + (thing - 1)
E         ? ^    ^
E         + "blah" + (thing - 1)
E         ? ^    ^

    def test_concats_fstring():
    
        txt = """print(f'blah{thing}' + 'blah' + otherThing + f"is {x:d}")"""
        expected = """print(f'blah{thing}blah{otherThing}is {x:d}')"""
    
        new, changed = transform_concat_from_str(txt)
    
        assert changed
>       assert new == expected
E       assert 'print(f"blah{thing}blah{otherThing}is {x:d}")' == "print(f'blah{thing}blah{otherThing}is {x:d}')"
E         - print(f'blah{thing}blah{otherThing}is {x:d}')
E         ?        ^                                   ^
E         + print(f"blah{thing}blah{otherThing}is {x:d}")
E         ?        ^                                   ^

    def test_string_in_string_x3():
    
        txt = """'blah' + blah.blah('more' + vars.foo('other' + b))"""
    
        new, changed = transform_concat_from_str(txt)
    
        assert changed
>       assert "'blah' +" in new
E       assert "'blah' +" in '"blah" + blah.blah(f"more{vars.foo(f\'other{b}\')}")'

test/test_str_concat/test_transformer.py:108: AssertionError

    def test_embedded_fstr():
    
        txt = """print(f"{f'blah{var}' + abc}blah")"""
        expected = """print(f'blah{var}{abc}blah')"""
    
        new, changed = transform_concat_from_str(txt)
    
        assert changed
>       assert new == expected
E       assert 'print(f"blah{var}{abc}blah")' == "print(f'blah{var}{abc}blah')"
E         - print(f'blah{var}{abc}blah')
E         ?        ^                  ^
E         + print(f"blah{var}{abc}blah")
E         ?        ^                  ^

akaihola · 2025-02-09T21:11:18Z

@ikamensh can you figure out why count suddenly is 1 instead of 0 in this test?

state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=4, call_transforms=3, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    def test_chain_fmt_3(state: State):
        s_in = """a = "Hello {}".format(d["a{}".format( d["a{}".format(key) ]) ] )"""
        s_out, count = code_editor.fstringify_code_by_line(s_in, state)
    
>       assert count == 0
E       assert 1 == 0

s_out looks correct when I add an assertion for it in the test:

a = f"Hello {d[f'a{d[f'a{key}']}']}"

akaihola · 2025-02-10T17:38:10Z

This test failure occurs with Black 25.1.0 but not with earlier versions:

filename = 'insert_constant_str.py'
state = State(quiet=False, aggressive=False, dry_run=False, stdout=False, multiline=True, len_limit=None, transform_percent=Tr...=2, call_transforms=1, invalid_conversions=0, concat_candidates=0, concat_changes=0, join_candidates=0, join_changes=0)

    @pytest.mark.parametrize("filename", samples)
    def test_fstringify(filename, state):
        # this skips "string_in_string.py" for python >=3.12.
        # the behavior on these python versions differs. In fact, 3.12 behavior is preferable.
        # When only supported versions are 3.12 and up, expected output should be modified.
        if filename == "string_in_string.py" and sys.version_info > (3, 11):
            return
    
        out, expected = try_on_file(
            filename,
            partial(fstringify_code_by_line, state=state),
        )
>       assert out == expected
E       assert 'def f():\n    return (\n        """\n        some text\n        lorem ipsum\n"""\n    )\n' == 'def f():\n    return (\n        """\n        some text\n        lorem ipsum\n        """\n    )\n'
E         
E           def f():
E               return (
E                   """
E                   some text
E                   lorem ipsum
E         -         """
E         + """
E               )

test/integration/test_files.py:25: AssertionError

The breaking PR was psf/black#4558. The commits are still available in the GitHub UI, but unfortunately unfetchable by Git since the original black25 branch from @JelleZijlstra is gone. Probably the breaking change is in commit 8117db06. Based on the change log entry, I'd say the breaking change corresponds to either of these issues:

akaihola · 2025-02-10T17:59:31Z

The issue with Black 25.1.0 seems to be that it interprets a plain string literal to be a docstring.

In actual Python source files, of course, a plain string literal (as opposed to e.g. a call argument, part of an expression, or an annotation) really is a docstring, so Black is kind of doing the right thing here.

So the problem is caused by the fact that Flynt's ast_to_string() indeed receives plain string literals. Feeding them into black.format_str() causes trailing whitespace to be eaten.

To reproduce the difference between behavior of Black versions 24.10.0 and 25.1.0 caused by psf/black#4558:

Plain string literal (as fed to `black.format_str()` by Flynt):

24.10.0

$ echo '"trailing spaces:    "' | uvx --with black==24.10.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.

25.1.0

$ echo '"trailing spaces:    "' | uvx --with black==25.1.0 black -q --diff --stdin-filename=test.py -
--- STDIN       2025-02-10 17:52:20.735183+00:00
+++ STDOUT	2025-02-10 17:52:20.737355+00:00
@@ -1 +1 @@
-"trailing spaces:    "
+"trailing spaces:"

Same string literal, always formatted correctly when in context:

$ echo 'print("trailing spaces:    ")' | uvx --with black==24.10.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.
$ echo 'print("trailing spaces:    ")' | uvx --with black==25.1.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.
$ echo 's = "trailing spaces:    "' | uvx --with black==24.10.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.
$ echo 's = "trailing spaces:    "' | uvx --with black==25.1.0 black --diff --stdin-filename=test.py -
All done! ✨ 🍰 ✨
1 file would be left unchanged.

akaihola · 2025-02-10T18:04:42Z

Here's an ugly work-around to the Black 25.1.0 problem. It doesn't cause test failures elsewhere.

def ast_to_string(node: ast.AST) -> str:
    # ast.unparse() favors single quotes, use Black to turn them into double quotes
    code = ast.unparse(node)
    # to avoid plain literal strings to be stripped of trailing whitespace,
    # we add a dummy assignment
    is_plain_constant = isinstance(node, ast.Constant)
    if is_plain_constant:
        code = f"_ = {code}"
    formatted = format_str(code, mode=Mode()).rstrip()
    # remove the dummy assignment if present
    return formatted[4:] if is_plain_constant else formatted

akaihola · 2025-02-10T18:18:21Z

As for the parentheses test failure, I think

f'{(a if c else b)}'

is just what ast.unparse() does, and there's no easy way around it. Could we @ikamensh just accept this change in behavior?

Here's the difference to Astor:

>>> import ast
>>> import astor
>>> node = ast.JoinedStr(
...     values=[ast.FormattedValue(value=ast.IfExp(test=ast.Name(id='c'), body=ast.Name(id='a'), orelse=ast.Name(id='b')), conversion=-1)]
... )
>>> print(ast.unparse(node))
f'{(a if c else b)}'
>>> print(astor.to_source(node))
f"""{a if c else b}"""

ikamensh · 2025-02-10T19:19:42Z

count from fstringify_code_by_line goes from zero to one

Count of changes means now this code is changed, lets compare out vs code_in and see what and how is changed

ikamensh · 2025-02-10T20:03:16Z

is just what ast.unparse() does, and there's no easy way around it. Could we @ikamensh just accept this change in behavior?

I think we can have any hacks around black internally, but I don't want flynt to produce code one would want to edit (manually?) further. extra parenthesis and single quotes are in this group. Looking at astor PR, I think it has good chances to be in time for the py3.14 release, I don't see a point to change behavior now to something suboptimal if no action will be needed on normal release schedule. This is ofcourse specific to replacing astor with ast + black part of #195 - fixing ast nodes remains fully motivated.

I'm further wondering about using black now. Sorry for not thinking it through before this PR came around. If its behavior is untrivially dependent on the version, then I see this problem: flynt will produce different output depending on black version. If we fix black version, it might conflict with what user might be already using. I think for 95% of the cases its no big deal, but it will be an annoying edge case going forward. I like astor because typical user will not have astor in their python installation. Black on the other hand is very popular, if maybe being slowly replaced by ruff.

all in all, for this to be worth it it should be possible to:

not reduce output code simplicity, e.g. no extra parens
have no known, significant behavior differences depending on black version
not to reduce standard compliance of output code (single quote means black/ruff must run after flynt to have standard code)

akaihola · 2025-02-10T20:58:07Z

I agree with everything above, especially since the parentheses problem is really hard to solve without modifying ast.unparse(). I'm closing this PR and looking at helping with berkerpeksag/astor#217 instead.

akaihola added 2 commits February 9, 2025 21:59

refactor: Replace astor with ast.unparse for string conversion

0be7e35

feat: enforce double quotes using Black

e0403b1

akaihola marked this pull request as draft February 9, 2025 20:15

akaihola added 2 commits February 9, 2025 23:00

test fix: unnecessary escaping of embedded closing quote

fcf176a

test fix: expectations for double quotes

137359a

akaihola mentioned this pull request Feb 10, 2025

Retire astor, as ast.unparse exists, and migrate it into API-compatible shims of ast for older versions of python berkerpeksag/astor#204

Open

akaihola closed this Feb 10, 2025

akaihola mentioned this pull request Feb 11, 2025

Python AST deprecations #195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace astor with ast.unparse() and Black #205

Replace astor with ast.unparse() and Black #205

akaihola commented Feb 9, 2025 •

edited

Loading

akaihola commented Feb 9, 2025 •

edited

Loading

akaihola commented Feb 9, 2025

akaihola commented Feb 10, 2025

akaihola commented Feb 10, 2025

akaihola commented Feb 10, 2025

akaihola commented Feb 10, 2025 •

edited

Loading

ikamensh commented Feb 10, 2025

ikamensh commented Feb 10, 2025

akaihola commented Feb 10, 2025

Replace astor with ast.unparse() and Black #205

Replace astor with ast.unparse() and Black #205

Conversation

akaihola commented Feb 9, 2025 • edited Loading

akaihola commented Feb 9, 2025 • edited Loading

akaihola commented Feb 9, 2025

akaihola commented Feb 10, 2025

akaihola commented Feb 10, 2025

Plain string literal (as fed to black.format_str() by Flynt):

24.10.0

25.1.0

Same string literal, always formatted correctly when in context:

akaihola commented Feb 10, 2025

akaihola commented Feb 10, 2025 • edited Loading

ikamensh commented Feb 10, 2025

ikamensh commented Feb 10, 2025

akaihola commented Feb 10, 2025

akaihola commented Feb 9, 2025 •

edited

Loading

akaihola commented Feb 9, 2025 •

edited

Loading

Plain string literal (as fed to `black.format_str()` by Flynt):

akaihola commented Feb 10, 2025 •

edited

Loading