-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode pages does not work anymore on 0.5.0 #71
Comments
@taganaka @tmaier, For some reason, if I use the code below in 0.5.0 non english unicode characters would show properly def doc
return @doc if @doc
@doc = Nokogiri::HTML(@body) if @body && html? rescue nil
end however this one would not. I'm not so sure what this function intended to do solve. Any suggestion is appreciated as I like to use 0.5.0 without monkey patching to the gem on my server. Thanks a lot. def doc
return @doc if @doc
@body ||= ''
@body = @body.encode('utf-8', 'binary', invalid: :replace,
undef: :replace, replace: '')
@doc = Nokogiri::HTML(@body.toutf8, nil, 'utf-8') if @body && html?
end |
Text inside <title> appear correctly in 0.4.0
Text inside <title> gone in 0.5.0. Only English text remains.
|
I'll take a look at this soon Thanks for reporting |
Thank you. Sent from my iPhone
|
Hi taganaka, Please let me know if you had a chance to look into? |
I am trying to upgrade to 0.5.1 and saw the same issue. |
I don't think this project is maintained anymore. |
Well it is. I'm still happily using it. Happy to accept PR too
On Thu, Jan 5, 2017 at 06:59 nengine ***@***.***> wrote:
I don't think this project is maintained anymore.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#71 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXmRRD3FDIeh2AT8QodFdIKh1-EpHKsks5rPQVegaJpZM4FZWNc>
.
--
*
<https://gild.com/?utm_campaign=Email-Signature&utm_medium=email&utm_source=gmail&utm_content=Gmail-Signature>*
*Francesco Laurita*
SVP Engineering | Gild, Inc.
cell 415-694-9038
465 California Street, Suite 1200
San Francisco, CA 94104
www.gild.com
|
Ok Great. I didn't see activity for nearly 2 years so just thought it was not maintained anymore. |
I was able to crawl Unicode pages in 0.4.0 but after upgrading to 0.5.0 only some English characters would be in a crawled page. Please let me if there any settings I have to change?
The text was updated successfully, but these errors were encountered: