Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule proposal: require explicit strict= argument for itertools.batched #498

Open
tjkuson opened this issue Nov 16, 2024 · 6 comments
Open

Comments

@tjkuson
Copy link

tjkuson commented Nov 16, 2024

As of Python 3.13, itertools.batched has a strict parameter that defaults to False. By default, the batches might not be of the same size, which may cause subtle bugs. Whenstrict=True, it raises ValueError if the final batch is not the same size as the rest.

This seems analogous to the existing rule B905 which requires an explicit strict= parameter for zip. Hence, I think it makes sense if there is a similar rule for itertools.batched.

I volunteer to implement the rule, if accepted!

@JelleZijlstra
Copy link
Collaborator

What sort of bugs do you see this catching?

At work we have a helper function equivalent to batched() that's heavily used for processing large amounts of data in chunks. It's always absolutely expected that some of the batches won't be of the same size.

@tjkuson
Copy link
Author

tjkuson commented Nov 17, 2024

That's understandable! However, I don't think it is obvious to an inexperienced user or someone who hasn't read the documentation that not all batches in itertools.batched(iterable, n) will be of size n. It also isn't obvious to me what a contributor intended or expects (without strict=) from itertools.batched when reviewing code.

Off the top of my head, some examples of the uniformity of batch sizes being important include

  • when the elements of a batched are unpacked, expecting n elements,
  • when the user expects to operate on pairs,
  • coordinate systems, or
  • statistics.

IMO, asking users to use strict=False for instances where not all the batches have to be uniform isn't a huge problem considering it's consistent with zip and that itertools.batched hasn't been in the standard library for that long.

@cooperlees
Copy link
Collaborator

I'm a little confused of the total value of this rule too. I get the goal, but feel the loose nature of this lint will lead to it possible being noisy ... or am I misunderstanding things here?

I feel if people need these assertions you make their unit tests should enforce those and hilight the non equal batch sizes becoming a problem. I'd need more wins here I feel.

Is the main goal to ensure people to always set strict= (being more explicit)?

@FozzieHi
Copy link
Contributor

FozzieHi commented Nov 19, 2024

I do see both sides of the argument, but I think on balance there's enough benefits here to create a rule. However, I think it should be opt-in like B905 and not on by default like B028, then users can decide whether they think it's beneficial to their project (I do believe that's what you intended when creating this issue, but I'm just noting it explicitly here). I think these kinds of rules do get users thinking more about how they want their implementation to work, and in this particular case would encourage them to explicitly set strict either to true or false depending on that use case. Obviously if it gets too noisy then they could simply not opt-in to the rule for that project.

@tjkuson
Copy link
Author

tjkuson commented Nov 19, 2024

Is the main goal to ensure people to always set strict= (being more explicit)?

Yes, essentially. I had in mind B905 except for itertools.batched.

@cooperlees
Copy link
Collaborator

I was unaware we wanted this to be optional / opt in. Based on that, I'd accept it the PR as the next B9XX rule.

FWIW, I am sure 99% of our opt in rules do not get used by our users tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants