Improve speed of hypdiff for large text #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If large text is diffed,
Diff::LCS
receives a lot of "single whitespace" sequence tokens, which have a heavy impact on execution time. The effect is not linear with the number of whitespaces, but more likeO(n^2)
.The solution is to hide (most) whitespace sequence items from
Diff::LCS
. A whitespaceTextFromNode
is now "merged" into its successorTextFromNode
if both belong to the same XML Text Node (i.e. originate from the samesplit
).This helps
Diff::LCS
in two ways:<
and alikeNote that specs are kept happy by collapsing adjacent whitespaces nodes (even across text nodes) first, and only afterwards merge a whitespace token into its successor.