Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node#visible_text use scrub to replace invalid UTF-8 sequences #76

Merged
merged 1 commit into from
Sep 28, 2024

Conversation

reedrolemodel
Copy link
Contributor

@reedrolemodel reedrolemodel commented Aug 27, 2024

Some pages cause a invalid byte sequence in UTF-8 exception to be raised when calling text.to_s.gsub(/\A[[:space:]&&[^\u00a0]]+/, ''). Adding scrub prevents this.

Specific context:
It seems a   HTML entity gets interpreted as "\xA0", or byte 160, which has an invalid encoding. Using charlock_homes the encoding of the entire page is reported as ISO-8859-1 with 54% confidence.

Some pages cause a `invalid byte sequence in UTF-8` exception to be raised when calling `text.to_s.gsub(/\A[[:space:]&&[^\u00a0]]+/, '')`. Adding `scrub` prevents this.
@mhenrixon
Copy link

I came here to report the issue! I'm glad to discover it already has a PR.

  1) Static pages GET /de/cookies renders a cookie policy
     Failure/Error:
       text.to_s.gsub(/\A[[:space:]&&[^\u00a0]]+/, '')
           .gsub(/[[:space:]&&[^\u00a0]]+\z/, '')
           .gsub(/\n+/, "\n")
           .tr("\u00a0", ' ')

     ArgumentError:
       invalid byte sequence in UTF-8

     [Screenshot Image]: tmp/capybara/screenshots/failures_r_spec_example_groups_static_pages_get_de_cookies_renders_a_cookie_policy_91.png


     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-playwright-driver-0.5.2/lib/capybara/playwright/node.rb:134:in `gsub'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-playwright-driver-0.5.2/lib/capybara/playwright/node.rb:134:in `block in visible_text'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-playwright-driver-0.5.2/lib/capybara/playwright/node.rb:83:in `assert_element_not_stale'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-playwright-driver-0.5.2/lib/capybara/playwright/node.rb:120:in `visible_text'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/node/element.rb:60:in `block in text'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/node/base.rb:77:in `synchronize'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/node/element.rb:60:in `text'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:603:in `matches_text_regexp'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:607:in `matches_text_regexp?'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:554:in `matches_text_filter?'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:452:in `matches_system_filters?'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:122:in `matches_filters?'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/result.rb:32:in `block in initialize'

@YusukeIwaki YusukeIwaki merged commit 0d8b37f into YusukeIwaki:main Sep 28, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants