Skip to content

Commit

Permalink
fix: compressed web pages cannot be parsed normally (#4)
Browse files Browse the repository at this point in the history
#### What type of PR is this?

/kind bug

#### What this PR does / why we need it:

当网页使用 `gzip` 时,网页会被解析为乱码。导致解析失败。

#### How to test it?

测试请求网站 https://www.bilibili.com/video/BV1Vu4m1M7Nr/ ,查看是否会报错。

#### Which issue(s) this PR fixes:

Fixes #3 

#### Does this PR introduce a user-facing change?
```release-note
解决被压缩的网站无法解析的问题
```
  • Loading branch information
LIlGG authored Jun 14, 2024
1 parent 1251e2d commit fa95355
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ private static boolean isProxy(ProxyConfig proxyConfig, String host) {

private static HttpClient getHttpClient() {
return HttpClient.create()
.responseTimeout(Duration.ofSeconds(10));
.responseTimeout(Duration.ofSeconds(10))
.compress(true);
}

record ProxyConfig(String host, int port, List<AddressConfig> hosts) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import org.jsoup.nodes.Document;
import org.jsoup.parser.Parser;
import org.jsoup.select.Elements;
import org.springframework.util.CollectionUtils;
import org.springframework.util.StringUtils;
import run.halo.editor.hyperlink.dto.HyperLinkBaseDTO;

Expand All @@ -23,8 +24,11 @@ public HyperLinkBaseDTO parse(String htmlContent) {
Elements meta = parse.getElementsByTag("meta");
parserMetas(meta, hyperLinkBaseDTO);

var title = parse.getElementsByTag("title").get(0).text();
hyperLinkBaseDTO.setTitle(title);
var titles = parse.getElementsByTag("title");
if (!CollectionUtils.isEmpty(titles)) {
var title = titles.get(0).text();
hyperLinkBaseDTO.setTitle(title);
}

Elements links = parse.getElementsByTag("link");
parserLinks(links, hyperLinkBaseDTO);
Expand Down

0 comments on commit fa95355

Please sign in to comment.