Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如果需要转换的string中有英文符号,会无法转换 #9

Open
moipa-cn opened this issue May 19, 2022 · 0 comments
Open

如果需要转换的string中有英文符号,会无法转换 #9

moipa-cn opened this issue May 19, 2022 · 0 comments

Comments

@moipa-cn
Copy link

···
e.DOM.Find("p").Each(func(i int, s *goquery.Selection) {
text := s.Text()
result := mahonia.NewDecoder("gbk").ConvertString(text)
fmt.Println(result)
})
···
这是一段爬取代码,text里面保存的是gbk编码的字符串。
我发现只要这个text里面有英文的“”双引号,双引号里面的内容都没有被转码。
输出的结果类似于
···
我是正常的中文鈥満焐氖谴竺ā⒙躺氖切
···
后面的乱码就是在英文的双引号中的文字。
但如果我把整个html页面包括div,li标签等都打印出来,就可以转码正常。
代码类似于:
···
c.OnHTML("#ArtContent", func(e *colly.HTMLElement) {
result := mahonia.NewDecoder("gbk").ConvertString(string(e.Response.Body))
fmt.Println(result)
···
在这里result 是完全转换成中文了,没有乱码。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant