-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keep a position cache to speed up sibling iteration #287
Conversation
This is meant to be an optimization, so I'm not sure of a good way to test it. For a demonstration of the impact it has though:
|
I should also add that we ran into this problem when using the ValuesAndExpressions::ProhibitDuplicateHashKeys critic policy on files with large hashes. |
Ahead: This looks excellent and my direct urge would be to throw it in immediately. That said: I had a heck of a week and will check this tomorrow to see if i can spot any issues. Don't think i will, but due diligence and such. |
Quick question without having looked for it in the code: Does this invalidate the cache if an element's position within its parent is changed by inserting or removing something? |
No, it doesn't. The cached position is always checked before being used, so it doesn't need to be invalidated on modifications. Validating the position doesn't slow anything down, so this seemed like a safer invalidation strategy than needing to ensure every write operation cleared the cache. |
I guess it's worth pointing out that this could slow down iteration in some specific cases. If you are only iterating through a small number of siblings at the start of a long list, the new code will still end up calculating the positions of every sibling where the old code would not. I don't think this is a large concern though. Iterating through the list of siblings once is not slow, so it seems unlikely it would ever be a meaningful slowdown. I did think of an alternate strategy that could avoid this. When checking the position of a given element, it could cache (and validate/recache) the positions of the requested element, as well as its previous and next sibling. Compared to what this PR implements, it ends up doing a bit more work for each individual operation, but could be cheaper if not iterating through all children. And iterating through all siblings would still be linear, not quadratic. It also makes the code a bit more complex. I felt that on balance the routine in this PR was a better solution. |
I'm prepping for merging this, but first need to actually get a solid set of dependent tests, which i'm fiddling with over here: https://github.com/Perl-Critic/PPI/actions/workflows/test-dependents.yml I saw that it seems you adapted some of the code style conventions you saw, like shift shift shift, and forgoing postfix constructs. If you care and feel like, you could abbreviate some of them to not have the code be unnecessarily verbose. Otherwise it looks fine in execution flow. |
8329f99
to
85ea6a1
Compare
Iterating over siblings using ->next_sibling or similar methods will take quadratic time if it has to search the parent's child list every time. Add a cache of the position in the parent's child list. When the position of a child is requested, cache the position of every child of the parent. Verifying that the cache is accurate is nearly instant, so we can use that to invalidate the cache rather than invalidating on modifications to the structure. Update the __position method to use this cache, and change all of the child iterating functions to use __position.
85ea6a1
to
dbe1108
Compare
Iterating over siblings using ->next_sibling or similar methods will take quadratic time if it has to search the parent's child list every time.
Add a cache of the position in the parent's child list. When the position of a child is requested, cache the position of every child of the parent. Verifying that the cache is accurate is nearly instant, so we can use that to invalidate the cache rather than invalidating on modifications to the structure.
Update the __position method to use this cache, and change all of the child iterating functions to use __position.