Skip to content

Commit

Permalink
Merge pull request #699 from gjtorikian/fix-internal-link-format
Browse files Browse the repository at this point in the history
Fix internal link format
  • Loading branch information
gjtorikian authored Mar 10, 2022
2 parents 58bc1d3 + 9634290 commit 85c7aa0
Show file tree
Hide file tree
Showing 13 changed files with 152 additions and 56 deletions.
10 changes: 5 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@

- URLs in \<source src=xx\> not being checked. [\#589](https://github.com/gjtorikian/html-proofer/issues/589)
- Link checking failed with "441 No error" [\#584](https://github.com/gjtorikian/html-proofer/issues/584)
- htmlproofer 3.16.0 | Error: undefined method `match?' for /^javascript:/:Regexp [\#582](https://github.com/gjtorikian/html-proofer/issues/582)
- htmlproofer 3.16.0 | Error: undefined method `match?' for /^javascript:/:Regexp [\#582](https://github.com/gjtorikian/html-proofer/issues/582)
- HTMLProofer runs out of memory [\#579](https://github.com/gjtorikian/html-proofer/issues/579)

**Merged pull requests:**
Expand Down Expand Up @@ -316,7 +316,7 @@

**Closed issues:**

- Error: undefined method `xpath' for nil:NilClass [\#544](https://github.com/gjtorikian/html-proofer/issues/544)
- Error: undefined method `xpath' for nil:NilClass [\#544](https://github.com/gjtorikian/html-proofer/issues/544)

**Merged pull requests:**

Expand Down Expand Up @@ -550,7 +550,7 @@
- Possible regression: htmlParseEntityRef: expecting ';' for protocol relative URLs [\#447](https://github.com/gjtorikian/html-proofer/issues/447)
- HEAD to GET fallback doesn't work if URL has a hash and HEAD causes a timeout [\#441](https://github.com/gjtorikian/html-proofer/issues/441)
- Allow using GET instead of HEAD [\#440](https://github.com/gjtorikian/html-proofer/issues/440)
- Error: wrong number of arguments [\#430](https://github.com/gjtorikian/html-proofer/issues/430)
- Error: wrong number of arguments [\#430](https://github.com/gjtorikian/html-proofer/issues/430)
- limit memory for travis builds [\#429](https://github.com/gjtorikian/html-proofer/issues/429)

**Merged pull requests:**
Expand All @@ -563,7 +563,7 @@

**Fixed bugs:**

- Error: string contains null byte [\#409](https://github.com/gjtorikian/html-proofer/issues/409)
- Error: string contains null byte [\#409](https://github.com/gjtorikian/html-proofer/issues/409)

**Closed issues:**

Expand Down Expand Up @@ -963,7 +963,7 @@
- Warnings for non-https anchors [\#252](https://github.com/gjtorikian/html-proofer/issues/252)
- html-proofer should eat Typhoeus exceptions [\#248](https://github.com/gjtorikian/html-proofer/issues/248)
- Incremental output [\#247](https://github.com/gjtorikian/html-proofer/issues/247)
- Error: `@shot.' is not allowed as an instance variable name [\#245](https://github.com/gjtorikian/html-proofer/issues/245)
- Error: `@shot.' is not allowed as an instance variable name [\#245](https://github.com/gjtorikian/html-proofer/issues/245)
- Don't count `?` forms with different parameters as different [\#236](https://github.com/gjtorikian/html-proofer/issues/236)

**Merged pull requests:**
Expand Down
46 changes: 23 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ Dir.mkdir("out") unless File.exist?("out")
pipeline = HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::TableOfContentsFilter
], :gfm => true
], gfm: true

# iterate over files, and generate HTML from Markdown
Find.find("./docs") do |path|
Expand Down Expand Up @@ -149,7 +149,7 @@ HTMLProofer.check_links(['https://github.com', 'https://jekyllrb.com']).run
Sometimes, the information in your HTML is not the same as how your server serves content. In these cases, you can use `swap_urls` to map the URL in a file to the URL you'd like it to become. For example:

```ruby
run_proofer(file, :file, swap_urls: { %r{^https://example.com} => 'https://website.com' })
run_proofer(file, :file, swap_urls: { %r{^https//example.com}: 'https://website.com' })
```

In this case, any link that matches the `^https://example.com` will be converted to `https://website.com`.
Expand Down Expand Up @@ -220,7 +220,7 @@ require 'html-proofer'

task :test do
sh "bundle exec jekyll build"
options = { :swap_urls => "^/BASEURL/:/" }
options = { swap_urls: "^/BASEURL/:/" }
HTMLProofer.check_directory("./_site", options).run
end
```
Expand Down Expand Up @@ -300,7 +300,7 @@ In addition, there are a few "namespaced" options. These are:
[Typhoeus](https://github.com/typhoeus/typhoeus) is used to make fast, parallel requests to external URLs. You can pass in any of Typhoeus' options for the external link checks with the options namespace of `:typhoeus`. For example:

``` ruby
HTMLProofer.new("out/", {:extensions => [".htm"], :typhoeus => { :verbose => true, :ssl_verifyhost => 2 } })
HTMLProofer.new("out/", {extensions: [".htm"], typhoeus: { verbose: true, ssl_verifyhost: 2 } })
```

This sets `HTMLProofer`'s extensions to use _.htm_, gives Typhoeus a configuration for it to be verbose, and use specific SSL settings. Check the [Typhoeus documentation](https://github.com/typhoeus/typhoeus#other-curl-options) for more information on what options it can receive.
Expand All @@ -311,13 +311,13 @@ The default value is:

``` ruby
{
:typhoeus =>
typhoeus:
{
:followlocation => true,
:connecttimeout => 10,
:timeout => 30
followlocation: true,
connecttimeout: 10,
timeout: 30
},
:hydra => { :max_concurrency => 50 }
hydra: { max_concurrency: 50 }
}
```

Expand All @@ -342,12 +342,12 @@ The `Authorization` header is being set if and only if the `base_url` is `https:
[Parallel](https://github.com/grosser/parallel) is used to speed internal file checks. You can pass in any of its options with the options namespace `:parallel`. For example:

``` ruby
HTMLProofer.check_directories(["out/"], {:extension => ".htm", :parallel => { in_processes: 3} })
HTMLProofer.check_directories(["out/"], {extension: ".htm", parallel: { in_processes: 3} })
```

In this example, `in_processes: 3` is passed into Parallel as a configuration option.

Pass in `:parallel => { enable: false }` to disable parallel runs.
Pass in `parallel: { enable: false }` to disable parallel runs.

On the CLI, you can provide the `--parallel` argument to set the configuration. This is parsed using `JSON.parse` and mapped on top of the default configuration values so that they can be overridden.

Expand All @@ -365,19 +365,19 @@ You can enable caching for this log file by passing in the option `:cache`, with
For example, passing the following options means "recheck links older than thirty days":

``` ruby
{ :cache => { :timeframe => '30d' } }
{ cache: { timeframe: '30d' } }
```

And the following options means "recheck links older than two weeks":

``` ruby
{ :cache => { :timeframe => '2w' } }
{ cache: { timeframe: '2w' } }
```

You can change the filename or the directory where the cachefile is kept by also providing the `storage_dir` key:

``` ruby
{ :cache => { :cache_file => 'stay_cachey.json', :storage_dir => '/tmp/html-proofer-cache-money' } }
{ cache: { cache_file: 'stay_cachey.json', storage_dir: '/tmp/html-proofer-cache-money' } }
```

Links that were failures are kept in the cache and *always* rechecked. If they pass, the cache is updated to note the new timestamp.
Expand Down Expand Up @@ -484,9 +484,9 @@ To ignore SSL certificates, turn off Typhoeus' SSL verification:

``` ruby
HTMLProofer.check_directory("out/", {
:typhoeus => {
:ssl_verifypeer => false,
:ssl_verifyhost => 0}
typhoeus: {
ssl_verifypeer: false,
ssl_verifyhost: 0}
}).run
```

Expand All @@ -496,8 +496,8 @@ To change the User-Agent used by Typhoeus:

``` ruby
HTMLProofer.check_directory("out/", {
:typhoeus => {
:headers => { "User-Agent" => "Mozilla/5.0 (compatible; My New User-Agent)" }
typhoeus: {
headers: { "User-Agent" => "Mozilla/5.0 (compatible; My New User-Agent)" }
}}).run
```

Expand All @@ -513,9 +513,9 @@ Sometimes links fail because they don't have access to cookies. To fix this you

``` ruby
HTMLProofer.check_directory("out/", {
:typhoeus => {
:cookiefile => ".cookies",
:cookiejar => ".cookies"
typhoeus: {
cookiefile: ".cookies",
cookiejar: ".cookies"
}}).run
```

Expand All @@ -529,7 +529,7 @@ To exclude urls using regular expressions, include them between forward slashes

``` ruby
HTMLProofer.check_directories(["out/"], {
:ignore_urls => [/example.com/],
ignore_urls: [/example.com/],
}).run
```

Expand Down
6 changes: 3 additions & 3 deletions lib/html_proofer/cache.rb
Original file line number Diff line number Diff line change
Expand Up @@ -94,15 +94,15 @@ def detect_url_changes(urls_detected, type)

# prepare to add new URLs detected
private def determine_additions(urls_detected, type)
additions = urls_detected.reject do |url, metadata|
additions = urls_detected.reject do |url, _metadata|
if @cache_log[type].include?(url)
@cache_log[type][url][:metadata] = metadata
# @cache_log[type][url][:metadata] = metadata

# if this is false, we're trying again
if type == :external
@cache_log[type][url][:found]
else
@cache_log[type][url][:metadata].none? { |m| m[:found] }
@cache_log[type][url][:metadata].all? { |m| m[:found] }
end
else
@logger.log :debug, "Adding #{url} to #{type} cache"
Expand Down
2 changes: 1 addition & 1 deletion lib/html_proofer/check.rb
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def add_to_internal_urls(url, line)
current_path: @runner.current_path,
line: line,
base_url: base_url,
found: nil
found: false
}
@internal_urls[url_string] << metadata
end
Expand Down
12 changes: 9 additions & 3 deletions lib/html_proofer/url_validator/internal.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ def validate
end

def run_internal_link_checker(links)
to_add = []
links.each_pair do |link, matched_files|
matched_files.each do |metadata|
url = HTMLProofer::Attribute::Url.new(@runner, link, base_url: metadata[:base_url])
Expand All @@ -31,20 +32,25 @@ def run_internal_link_checker(links)

unless file_exists?(url)
@failed_checks << Failure.new(@runner.current_path, 'Links > Internal', "internally linking to #{url}, which does not exist", line: metadata[:line], status: nil, content: nil)
@cache.add_internal(url.to_s, metadata, false)
to_add << [url, metadata, false]
next
end

unless hash_exists?(url)
@failed_checks << Failure.new(@runner.current_path, 'Links > Internal', "internally linking to #{url}; the file exists, but the hash '#{url.hash}' does not", line: metadata[:line], status: nil, content: nil)
@cache.add_internal(url.to_s, metadata, false)
to_add << [url, metadata, false]
next
end

@cache.add_internal(url.to_s, metadata, true)
to_add << [url, metadata, true]
end
end

# adding directly to the cache above results in an endless loop
to_add.each do |(url, metadata, exists)|
@cache.add_internal(url.to_s, metadata, exists)
end

@failed_checks
end

Expand Down
29 changes: 22 additions & 7 deletions spec/html-proofer/cache_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -253,10 +253,10 @@ def read_cache(cache_filename)
context 'internal links' do
context 'dates' do
let(:cache_filename) { File.join(version, '.within_date_internal.json') }
let(:test_file) { File.join(FIXTURES_DIR, 'links', 'root_link', 'root_link.html') }
let(:test_file) { File.join(FIXTURES_DIR, 'links', 'working_root_link_internal.html') }
let(:new_time) { Time.local(2015, 10, 27, 12, 0, 0) }

it 'does not write file if timestamp is within date' do
new_time = Time.local(2015, 10, 27, 12, 0, 0)
Timecop.freeze(new_time) do
expect_any_instance_of(HTMLProofer::Cache).to receive(:write)

Expand All @@ -268,12 +268,11 @@ def read_cache(cache_filename)
end

it 'does write file if timestamp is not within date' do
new_time = Time.local(2015, 10, 27, 12, 0, 0)
Timecop.freeze(new_time) do
expect_any_instance_of(HTMLProofer::Cache).to receive(:write)

# we expect an add since we are mocking outside the timeframe
expect_any_instance_of(HTMLProofer::Cache).to receive(:add_internal).with('/', { :base_url => '', current_path: test_file, :found => nil, :line => 5, :source => test_file }, true)
expect_any_instance_of(HTMLProofer::Cache).to receive(:add_internal).with('/broken_root_link_internal.html', { base_url: '', found: true, line: 5, source: test_file }, true)

run_proofer(test_file, :file, disable_external: true, cache: { timeframe: '4d', cache_file: cache_filename }.merge(default_cache_options))
end
Expand All @@ -290,7 +289,7 @@ def read_cache(cache_filename)
Timecop.freeze(new_time) do
root_link = File.join(FIXTURES_DIR, 'links', 'root_link', 'root_link_with_another_link.html')

expect_any_instance_of(HTMLProofer::Cache).to receive(:add_internal).once.with('/', { :base_url => '', current_path: root_link, found: nil, :line => 5, :source => root_link }, true).and_call_original
expect_any_instance_of(HTMLProofer::Cache).to receive(:add_internal).with('/', { base_url: '', current_path: root_link, found: false, line: 5, source: root_link }, true).and_call_original

expect_any_instance_of(HTMLProofer::Cache).to receive(:write).once

Expand All @@ -302,7 +301,7 @@ def read_cache(cache_filename)
Timecop.freeze(new_time) do
expect_any_instance_of(HTMLProofer::Cache).to receive(:write)
root_link = File.join(FIXTURES_DIR, 'links', 'broken_internal_link.html')
expect_any_instance_of(HTMLProofer::Cache).to receive(:add_internal).once.with('#noHash', { :base_url => '', :current_path => root_link, found: nil, :line => 5, :source => root_link }, false)
expect_any_instance_of(HTMLProofer::Cache).to receive(:add_internal).once.with('#noHash', { base_url: '', current_path: root_link, found: false, line: 5, source: root_link }, false)

run_proofer(root_link, :file, disable_external: true, cache: { timeframe: '30d', cache_file: cache_filename }.merge(default_cache_options))
end
Expand Down Expand Up @@ -340,7 +339,7 @@ def read_cache(cache_filename)
end
end

it 'does recheck failures, regardless of cache' do
it 'does recheck external failures, regardless of cache' do
Timecop.freeze(new_time) do
cache_filename = File.join(version, '.recheck_failure.json')

Expand All @@ -354,6 +353,22 @@ def read_cache(cache_filename)
end
end

it 'does recheck internal failures, regardless of cache' do
cache_filename = File.join(version, '.broken_internal.json')
test_path = File.join(FIXTURES_DIR, 'cache', 'example_site')
test_file = File.join(test_path, 'index.html')

Timecop.freeze(new_time) do
expect_any_instance_of(HTMLProofer::Cache).to receive(:write)

# we expect the same link to be re-added, even though we are within the time frame,
# because `index.html` contains a failure
expect_any_instance_of(HTMLProofer::Cache).to receive(:add_internal).with('/missing.html', { base_url: '', current_path: test_file, found: false, line: 6, source: test_path }, false)

run_proofer(test_path, :directory, disable_external: true, cache: { timeframe: '30d', cache_file: cache_filename }.merge(default_cache_options))
end
end

it 'does recheck failures, regardless of external-only cache' do
Timecop.freeze(new_time) do
cache_filename = File.join(version, '.recheck_external_failure.json')
Expand Down
9 changes: 9 additions & 0 deletions spec/html-proofer/fixtures/cache/example_site/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

<!doctype html>
<html>
<head><title>Example</title></head>
<body>
<a href="/missing.html">Missing internal link</a>
</body>
</html>

18 changes: 18 additions & 0 deletions spec/html-proofer/fixtures/cache/version_2/.broken_internal.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"version": 2,
"external": {},
"internal": {
"/": {
"time": "2015-10-20 12:00:00 -0700",
"metadata": [
{
"source": "spec/html-proofer/fixtures/cache/root_link.html",
"line": 11,
"base_url": "",
"found": false
}
]
}
}
}

Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"version":2,"internal":{"/somewhere.html":{"time":"2022-01-06 12:00:00 -0500","metadata":[{"source":"spec/html-proofer/fixtures/cache/internal_and_external_example.html","current_path":"spec/html-proofer/fixtures/cache/internal_and_external_example.html","line":11,"base_url":"","found":null}]}},"external":{"https://github.com/gjtorikian/html-proofer":{"time":"2022-01-06 12:00:00 -0500","found":true,"status_code":200,"message":"OK","metadata":[{"filename":"spec/html-proofer/fixtures/cache/internal_and_external_example.html","line":7}]}}}
{"version":2,"internal":{"/somewhere.html":{"time":"2022-01-06 12:00:00 -0500","metadata":[{"source":"spec/html-proofer/fixtures/cache/internal_and_external_example.html","current_path":"spec/html-proofer/fixtures/cache/internal_and_external_example.html","line":11,"base_url":"","found":false},{"source":"spec/html-proofer/fixtures/cache/internal_and_external_example.html","current_path":"spec/html-proofer/fixtures/cache/internal_and_external_example.html","line":11,"base_url":"","found":false}]}},"external":{"https://github.com/gjtorikian/html-proofer":{"time":"2022-01-06 12:00:00 -0500","found":true,"status_code":200,"message":"OK","metadata":[{"filename":"spec/html-proofer/fixtures/cache/internal_and_external_example.html","line":7}]}}}
2 changes: 1 addition & 1 deletion spec/html-proofer/fixtures/cache/version_2/.runner.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"version":2,"internal":{},"external":{"https://www.github.com/":{"time":"2022-01-03 14:06:05 -0500","found":true,"status_code":200,"message":"OK","metadata":[{"filename":"spec/html-proofer/fixtures/links/_site/folder.html/index.html","line":4}]}}}
{"version":2,"internal":{},"external":{"https://www.github.com":{"time":"2022-02-17 17:25:09 -0500","found":true,"status_code":200,"message":"OK","metadata":[{"filename":"spec/html-proofer/fixtures/links/_site/folder.html/index.html","line":4}]}}}
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
"version": 2,
"external": {},
"internal": {
"/": {
"/broken_root_link_internal.html": {
"time": "2015-10-20 12:00:00 -0700",
"metadata": [
{
"source": "spec/html-proofer/fixtures/cache/root_link.html",
"line": 11,
"source": "spec/html-proofer/fixtures/links/working_root_link_internal.html",
"line": 5,
"base_url": "",
"found": false
"found": true
}
]
}
Expand Down
Loading

0 comments on commit 85c7aa0

Please sign in to comment.