Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Python code with some ruff rules SIM #13485

Closed

Conversation

cclauss
Copy link
Contributor

@cclauss cclauss commented Jul 30, 2024

https://docs.astral.sh/ruff/rules/#flake8-simplify-sim

Related to

% ruff check --select=SIM --ignore=SIM105,SIM108,SIM110,SIM112,SIM114,SIM910 --fix --unsafe-fixes

Found 224 errors (139 fixed, 85 remaining).

% ruff rule SIM118

in-dict-keys (SIM118)

Derived from the flake8-simplify linter.

Fix is always available.

What it does

Checks for key-existence checks against dict.keys() calls.

Why is this bad?

When checking for the existence of a key in a given dictionary, using
key in dict is more readable and efficient than key in dict.keys(),
while having the same semantics.

Example

key in foo.keys()

Use instead:

key in foo

Fix safety

Given key in obj.keys(), obj could be a dictionary, or it could be
another type that defines a .keys() method. In the latter case, removing
the .keys() attribute could lead to a runtime error. The fix is marked
as safe when the type of obj is known to be a dictionary; otherwise, it
is marked as unsafe.

References

@cclauss cclauss force-pushed the simplify-with-ruff-rules-sim branch from d627982 to aa0a4ce Compare July 30, 2024 08:55
@cclauss cclauss changed the title Siimplify Python code with some ruff rules SIM Simplify Python code with some ruff rules SIM Jul 30, 2024
Copy link
Member

@eli-schwartz eli-schwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these changes appear to be about inlining conditions onto a single line, which isn't immediately obvious is always superior. And it's not even mentioned in the PR description, let alone the commit message.

In general, the SIM rules are very opinionated stylistic things that have a good reason to not be enabled by default (or for flake8, even installed by default). I would rather have better justifications for using such rules...

unittests/linuxliketests.py Show resolved Hide resolved
mesonbuild/backend/ninjabackend.py Outdated Show resolved Hide resolved
mesonbuild/backend/ninjabackend.py Outdated Show resolved Hide resolved
mesonbuild/compilers/mixins/clike.py Outdated Show resolved Hide resolved
mesonbuild/compilers/mixins/visualstudio.py Outdated Show resolved Hide resolved
mesonbuild/compilers/mixins/visualstudio.py Outdated Show resolved Hide resolved
run_project_tests.py Outdated Show resolved Hide resolved
run_project_tests.py Show resolved Hide resolved
run_project_tests.py Outdated Show resolved Hide resolved
unittests/linuxliketests.py Show resolved Hide resolved
unittests/machinefiletests.py Outdated Show resolved Hide resolved
@dcbaker
Copy link
Member

dcbaker commented Jul 30, 2024

I'm in favor of universally getting rid of bad uses of .keys() as well as replacing loops with any/all, but I agree wtih @eli-schwartz that not all of these changes (especially always using ternaries) is a blanket improvement. I'd also really like to see any of these that we're going to turn on have a lint that is permanently on in Meson, either through pylint or flake8

@cclauss cclauss force-pushed the simplify-with-ruff-rules-sim branch from aa0a4ce to 7ccadc6 Compare July 30, 2024 19:33
@cclauss
Copy link
Contributor Author

cclauss commented Jul 30, 2024

Removed ruff rule SIM108

if-else-block-instead-of-if-exp (SIM108)

Derived from the flake8-simplify linter.

Fix is sometimes available.

What it does

Check for if-else-blocks that can be replaced with a ternary operator.

Why is this bad?

if-else-blocks that assign a value to a variable in both branches can
be expressed more concisely by using a ternary operator.

Example

if foo:
    bar = x
else:
    bar = y

Use instead:

bar = x if foo else y

References

@cclauss cclauss force-pushed the simplify-with-ruff-rules-sim branch from 7ccadc6 to 39be303 Compare July 30, 2024 22:11
@cclauss
Copy link
Contributor Author

cclauss commented Jul 30, 2024

I'd also really like to see any of these that we're going to turn on have a lint that is permanently on in Meson, either through pylint or flake8

The mods in this PR are code transformations that can happen once or they can be linted for on an ongoing basis. The latter is more complex. flake8 plus https://pypi.org/project/flake8_simplify could be configured to lint the changes in this PR. There are no automated fixers with flake8 as there are with ruff but flake can lint code to catch regressions.

@@ -87,7 +87,7 @@ def __native_headers(self, state: ModuleState, args: T.Tuple[T.List[mesonlib.Fil
'@INPUT@',
])

prefix = classes[0] if not package else package
prefix = package if package else classes[0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prefix = package if package else classes[0]
prefix = package or classes[0]

@cclauss
Copy link
Contributor Author

cclauss commented Jul 30, 2024

I would rather have better justifications for using such rules...

Some of these rules like SIM118 (see commit message) are shorter and more efficient. Others are just transforms that make the intent easier to understand with fewer lines of code (-77 lines in this PR). The first line of the git commit contains a link to the list of all SIM rules and each of those is a link to a full page description of that rule. The git commit also demonstrates ruff rule SIM118 command which can provide useful on-the-fly docs.

As we saw with rule SIM108, some folks like certain transforms while others do not. I have grown quite used to reading ternary-if and now just write them by habit. Turning rules on or off is a choice for each project.

https://docs.astral.sh/ruff

Ruff can be used to replace Flake8 (plus dozens of plugins), Black, isort, pydocstyle, pyupgrade, autoflake, and more, all while executing tens or hundreds of times faster than any individual tool.

pylint lints with ruff and says in https://pylint.readthedocs.io/en/stable/#advised-linters-alongside-pylint

Advised linters alongside pylint

Projects that you might want to use alongside pylint include ruff (really fast, with builtin auto-fix and a large number of checks taken from popular linters, but implemented in rust)

@eli-schwartz
Copy link
Member

Thanks for pointing all that out. All this information was also my starting premise.

My question was not what does the PR do, but rather, whether each choice should be made... there's a reason we didn't simply install flake8-simplify.

P.S. I also know what ruff is! I package it for a linux distro, submit bug reports for it, and use it with meson for a couple things already. I don't need copious pages of copy-pasting snippets of ruff documentation to tell me what it does. And until ruff supports all the flake8 rules I'm not eager to switch. (Ruff initially didn't implement many rules because they suggested using black instead. Meson does not, and will not, use black because black produces terrible code. Even the rules that ruff has slowly added support for are mostly in preview.)

Comment on lines +81 to +82
if os.path.isabs(libname) and os.path.exists(libname):
return [libname]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the intent easier to understand with fewer lines of code (-77 lines in this PR).

Simply reducing the number of lines is not the same as making the intent easier to understand. For an example, python supports rewriting this:

Suggested change
if os.path.isabs(libname) and os.path.exists(libname):
return [libname]
if os.path.isabs(libname) and os.path.exists(libname): return [libname]

But most people wouldn't do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not swim upstream against PEP8.

Comment on lines 443 to +444
referenced_vars = set()
optnames = [x.name for x in BUILTIN_DIR_OPTIONS.keys()]
optnames = [x.name for x in BUILTIN_DIR_OPTIONS]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python also supports this:

Suggested change
referenced_vars = set()
optnames = [x.name for x in BUILTIN_DIR_OPTIONS.keys()]
optnames = [x.name for x in BUILTIN_DIR_OPTIONS]
referenced_vars = set(); optnames = [x.name for x in BUILTIN_DIR_OPTIONS]

It, too, is fewer lines. It, too, is widely considered less readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8

@eli-schwartz
Copy link
Member

I'm trying to point out a general principle here, that quoting dogma is not the same as stating a reason. But I suspect the disconnect here may be part and parcel of the principle that:

because an automated linter includes the selectable option to say you CAN do it, therefore you SHOULD do it

if substr in s:
return True
return False
return any(substr in s for s in strlist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC these any() approaches are much slower. It's an interesting quirk of a low-level function whose body is basically entirely defined by either a short-circuiting for loop or a short-circuiting any().

Copy link
Contributor Author

@cclauss cclauss Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ruff Why is this bad? section says any() is faster but perhaps you have a benchmark that proves it wrong.

% ruff rule SIM110

reimplemented-builtin (SIM110)

Why is this bad?

Using a builtin function is more concise and readable. Builtins are also
more efficient than for loops.


I am OK to turn off ruff rule SIM110 if you think that is best. I find that when I read the code out loud, the use of any() makes the code read the way I think. Others might disagree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really unfortunate because I think that any()/all() is MUCH more readable than the for loop, and the vertical space saving is very nice. However, any seems to be about 1/3 as fast (Python 3.11.9, NixOS):

loop = 'abcdef'.split()
found = 'f'
nfound = 'g'


def test1():
    for x in loop:
        if x == found:
            return True
    return False


def test2():
    return any(x == found for x in loop)


def test3():
    for x in loop:
        if x == nfound:
            return True
    return False


def test4():
    return any(x == nfound for x in loop)

if __name__ == "__main__":
    import timeit

    print('for loop (found)    :', timeit.timeit(test1))
    print('for loop (not found):', timeit.timeit(test3))
    print('any() (found)       :', timeit.timeit(test2))
    print('any() (not found)   :', timeit.timeit(test4))
for loop (found)    : 0.051076093994197436
for loop (not found): 0.04388196699437685
any() (found)       : 0.15422860698890872
any() (not found)   : 0.15568504799739458

Copy link
Contributor Author

@cclauss cclauss Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Python 3.13 release candidate 1 (in Docker) on M1 Mac:

for loop (found)    : 0.09455837499990594
for loop (not found): 0.09435533400028362
any() (found)       : 0.27777291600068565
any() (not found)   : 0.27852720800001407

Should we drop SIM110 ?

@cclauss
Copy link
Contributor Author

cclauss commented Jul 31, 2024

until ruff supports all the flake8 rules I'm not eager to switch.

I am not advocating that you switch. Ruff is not yet at v1. flake8 and pylint continue to be adequate. This PR is about looking at a few simplifying code transformations. If a rule does not improve the code then we should drop it. If none of the rules improve the code then we should close the PR.

(Ruff initially didn't implement many rules because they suggested using black instead. Meson does not, and will not, use black because black produces terrible code. Even the rules that ruff has slowly added support for are mostly in preview.)

I recognize that you wish to format code by hand so I am not suggesting that you drop flake8 or pylint. I am not aware of any flake8 rules that ruff does not support except the Exxx rules in preview that are better handled by a code formatter. I don't think that the Ruff team spends time convincing people to use psf/black anymore given that they now have ruff format.

quoting dogma is not the same as stating a reason.

This is computer science, not religion or politics. The Why is this bad? section for each rule is a curated explanation. If the explanation does not justify the change then we should drop that rule.

because an automated linter includes the selectable option to say you CAN do it, therefore you SHOULD do it

I agree 100%. The initial commit --ignored five SIM rules and through our discussions, we have added a sixth.

@cclauss cclauss force-pushed the simplify-with-ruff-rules-sim branch from 39be303 to 96dbb2f Compare August 2, 2024 12:44
@cclauss
Copy link
Contributor Author

cclauss commented Aug 2, 2024

Removed all instances of SIM110 because it is less efficient than the code it releases.

Added a pull request to amend the Ruff docs at https://github.com/astral-sh/docs/pulls

@cclauss cclauss closed this Aug 16, 2024
@cclauss cclauss deleted the simplify-with-ruff-rules-sim branch August 16, 2024 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants