Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

除了hanlp/hanlp-index,有没有其他可供选择或调整的分析器? #2

Open
upbit opened this issue Sep 22, 2018 · 2 comments

Comments

@upbit
Copy link

upbit commented Sep 22, 2018

hanlp-index模式下,全量池测试得到一个很特殊的词组方式

例子(hanlp版本1.5.3):

POST _analyze
{
  "analyzer" : "hanlp-index",
  "text": "搜索全量池测试"
}

{
  "tokens": [
    {
      "token": "搜索",
      "start_offset": 0,
      "end_offset": 2,
      "type": "vn",
      "position": 0
    },
    {
      "token": "全量池测试",
      "start_offset": 2,
      "end_offset": 7,
      "type": "nt",
      "position": 1
    },
    {
      "token": "测试",
      "start_offset": 5,
      "end_offset": 7,
      "type": "vn",
      "position": 2
    }
  ]
}

希望能切成搜索 全量池 测试,或者max_word方式的搜索 全量池测试 全量池 测试,不知道该怎么配置

@boliza
Copy link
Contributor

boliza commented Oct 23, 2018

我也要研究下,如果你知道怎么修改,帮提交 PR

@boliza
Copy link
Contributor

boliza commented Oct 23, 2018

感觉是你的词库问题

  "tokens": [
    {
      "token": "搜索",
      "start_offset": 0,
      "end_offset": 2,
      "type": "vn",
      "position": 0
    },
    {
      "token": "全量池",
      "start_offset": 2,
      "end_offset": 5,
      "type": "nr",
      "position": 1
    },
    {
      "token": "测试",
      "start_offset": 5,
      "end_offset": 7,
      "type": "vn",
      "position": 2
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants