How to find all matches of a link in a document using Nokogiri #3420
-
Currently this is the code I’m using and it does not seem to catch all.
If possible, I’d like to ignore case as well. Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
@willowlight Your results are going to depend on your input HTML. If possible, it would be great to be able to reproduce what you're seeing in a complete running example in a single file like this: #!/usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "nokogiri"
end
html = <<~HTML
<html><body>
<a href="http://example.com">nope</a>
<a href="https://mylink.com">yes</a>
</body></html>
HTML
doc = Nokogiri::HTML(html)
doc.xpath('//a[@href="https://mylink.com"]').each do |node|
puts node.text
end
# >> yes Case-insensitive XPath searches are not supported in XPath 1.0, which is the version implemented by libxml2. If you're dealing with ASCII strings, you might be able to write a complicated XPath expression using the #!/usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "nokogiri"
end
html = <<~HTML
<html><body>
<a href="http://example.com">nope</a>
<a href="https://mylink.com">yes</a>
<a href="https://MyLink.com">yes also</a>
</body></html>
HTML
class CustomXPathFunctions
def case_insensitive_compare(nodes, value)
nodes.first.text.downcase == value.downcase
end
end
doc = Nokogiri::HTML(html)
results = doc.xpath('//a[nokogiri:case_insensitive_compare(@href, "https://mylink.com")]',
CustomXPathFunctions.new)
results.each do |node|
puts node.text
end
# >> yes
# >> yes also I hope this is helpful? LMK if you have more questions. |
Beta Was this translation helpful? Give feedback.
-
@flavorjones Thanks! My script goes through thosands of HTML pages. I will isolate one where the issue happens and get back to you. |
Beta Was this translation helpful? Give feedback.
-
Here's an update... it was definitely a case issue. Using the provided |
Beta Was this translation helpful? Give feedback.
@willowlight Your results are going to depend on your input HTML. If possible, it would be great to be able to reproduce what you're seeing in a complete running example in a single file like this:
Case-insensitive XPath searches are not supported in XPath 1.0, which is the version implemented by libxml2. If you're dealing with ASCII stri…