-
-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with internal link cache re-checking logic (4.0.0.rc3) #695
Comments
I have been investigating this, which is primarily caused by overriding the metadata, which loses the html-proofer/lib/html_proofer/cache.rb Lines 97 to 111 in 8107f0c
I am trying to propose a fix following my investigations |
* Reveals the issue gjtorikian#695 with checking failed links * Also fix test cache for within-date checking
…torikian#695) * We cannot overwrite the cache metadata before checking if the internal link was found. * Don't recheck only if all cached metadata references to an internal link passed. * Log the re-addition of not found urls (debug level)
See PR #696 Note that there is still one main flaw even after the proposed fix, which re-checks an internal url if not all cached checks were OK: html-proofer/lib/html_proofer/cache.rb Line 104 in 287bd2d
This is not enough, since a relative URL that was OK for all the files previously checked and cached does not provide any guarantee against e.g. the same URL being OK for a new file (for instance, this could happen if a file is moved and the relative urls in it would not be updated). This is very related to #671 (comment) |
…ache * See gjtorikian#695 (comment) * The link should be re-checked as it was not checked for the given file, regardless of whether it was found OK in the cache for other locations. * This is currently failing, as we miss the logic for determining this should be re-checked.
I have created a new branch off the one of PR #696, where I could add a test for the problem mentioned in #695 (comment) above |
…ache * See gjtorikian#695 (comment) * The link should be re-checked as it was not checked for the given file, regardless of whether it was found OK in the cache for other locations. * This is currently failing, as we miss the logic for determining this should be re-checked.
…data * See gjtorikian#695 (comment) * The new combined key prevents using the metadata to store the information, and uniquely identifies what is relevant for checking internal links. * This aligns internal and external collection and cache. * The combined key needs to be serialized to a string when writing the cache to JSON and parsed when reading.
* To show the issue mentioned in gjtorikian#695 (comment) is addressed with the new approach to internal cache keys.
@gjtorikian, I took a bit of liberty (and fun) to investigate a possible rethinking of internal links cache keys, by using a combined key including the relevant "context" used for checking them. This was also largely inspired by your considerations in #671 (comment) and #671 (comment) You can find this drafted in a branch: riccardoporreca/html-proofer@feature/695-fix-internal-cache-metadata-check...riccardoporreca:feature/695-investigate-internal-cache-key-metadata The approach seems to work, although it has downsides in the JSON cache, where keys become less readable. |
* To show the issue mentioned in #695 (comment) is addressed with the new approach to internal cache keys.
Gotcha. Thanks for the explanation. I'm far less concerned about a human-readable version of the cache. |
I need to look at this one a bit more carefully, because I bungled up your branch with a force push. There are a couple of tests still failing but in general I think what you've done makes sense. |
Indeed, I did not want to fix tests before knowing if the approach would be something workable at your end. I think it is important to align all test cache files to the new structure: tests checking if a link is added because it was failed in the cache or the cache expired might pass for the wrong reason of the key not being in the cache Let me know if I can help with this... |
I just remembered I dropped this—will try to make time for it this weekend. |
* Reveals the issue #695 with checking failed links * Also fix test cache for within-date checking
Cached internal broken links are not re-checked
Running first with empty cache
Run again: the broken link is not re-checked
The text was updated successfully, but these errors were encountered: