Skip to content

Latest commit

 

History

History
83 lines (56 loc) · 2.71 KB

README.md

File metadata and controls

83 lines (56 loc) · 2.71 KB

CJKFilterUtils

Build Status codecov

This is a Lucene filter and filter factory (see http://lucene.apache.org ) to fold certain CJK characters to improve recall. You should put it in your analysis chain BEFORE ICUTransforms from Traditional->Simplified Han, as it converts modern Japanese Kanji to their traditional equivalents.

Usage

  • clone the project

git clone git://github.com/solrmarc/CJKFilterUtils.git

  • run the maven installation

mvn clean install

  • put the CJKFilterUtils*.jar file found in the target directory into your Solr lib directory
  • utilize the Solr CJKFoldingFilterFactory in your schema.xml file.

Checking example locally

(Uses Ruby)

Install Ruby dependencies

$ bundle install

Setup Solr with CJKFilterUtils and config/schema

$ bundle exec rake setup_server

Run solr_wrapper

$ solr_wrapper

In another shell, index fixtures

$ bundle exec rake fixtures

Run some queries (these should return results):

$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:呂思勉两晋南北朝&wt=json

$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:俞平伯红楼梦&wt=json

$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:南洋&wt=json

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request