Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Use itertools to clean up SymbolState::merge #15702

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dcreager
Copy link
Member

merge_join_by handles the "merge two sorted iterators" bit, and izip handles iterating through the bindings/definitions along with their associated constraints.

@dcreager dcreager added internal An internal refactor or improvement red-knot Multi-file analysis & type inference labels Jan 23, 2025
@carljm
Copy link
Contributor

carljm commented Jan 23, 2025

I think you need @MichaReiser review on this PR :) Both because he knows Rust a lot better than I do, and because I'm aware that he has an aversion to itertools, because it has iterators that silently allocate, and arguably this is bad because allocations should be more visible than that.

@carljm
Copy link
Contributor

carljm commented Jan 23, 2025

Thanks for this!

This definitely looks simpler and easier to understand; having the "merge-join-by" logic extracted as a utility is nice to separate it from the red-knot-specific logic. I'm also not sure how much to weight that here, since this is core infra that shouldn't have to change too often; eking out the best performance we can might be higher priority.

On that note, CodSpeed suggests that this is a 2% regression on "cold" check (which is where I'd expect semantic indexing cost to be relevant, as opposed to "incremental" check where Salsa validation costs usually dominate): https://codspeed.io/astral-sh/ruff/branches/dcreager%2Fmerge-cleanup

I'm not sure where that regression would be coming from, exactly; CodSpeed flame graph suggests it is all coming from a salsa ingredient get_or_create method, which doesn't make a lot of sense, so maybe it's just noise? It does seem like there would be some loss of efficiency here in that we can no longer reuse one of the declarations bitsets (via union-in-place), but instead we start from zero and add all declarations? That maybe isn't likely to make much difference?

Like I said above, I'd like to hear what @MichaReiser thinks.

@carljm
Copy link
Contributor

carljm commented Jan 23, 2025

Might be worth running that cold benchmark a few times locally with and without this change, just to see if we can get a sense of how stable vs noisy that regression is?

@dcreager
Copy link
Member Author

Might be worth running that cold benchmark a few times locally with and without this change, just to see if we can get a sense of how stable vs noisy that regression is?

Can do

we can no longer reuse one of the declarations bitsets (via union-in-place)

I can also try a version that adds back the union in-place. The union itself was OR-ing block by block, instead of bit by bit, and also had a reserve step to make sure there was at most 1 (re)allocation. Interestingly, we were using union for SymbolDeclarations, but not for SymbolBindings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants