CJKFilterUtils

This is a Lucene filter and filter factory (see http://lucene.apache.org ) to fold certain CJK characters to improve recall. You should put it in your analysis chain BEFORE ICUTransforms from Traditional->Simplified Han, as it converts modern Japanese Kanji to their traditional equivalents.

Usage

clone the project

git clone git://github.com/solrmarc/CJKFilterUtils.git

run the maven installation

mvn clean install

put the CJKFilterUtils*.jar file found in the target directory into your Solr lib directory
utilize the Solr CJKFoldingFilterFactory in your schema.xml file.

Checking example locally

(Uses Ruby)

Install Ruby dependencies

$ bundle install

Setup Solr with CJKFilterUtils and config/schema

$ bundle exec rake setup_server

Run solr_wrapper

$ solr_wrapper

In another shell, index fixtures

$ bundle exec rake fixtures

Run some queries (these should return results):

$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:呂思勉两晋南北朝&wt=json

$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:俞平伯红楼梦&wt=json

$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:南洋&wt=json

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Added some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CJKFilterUtils

Usage

Checking example locally

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

CJKFilterUtils

Usage

Checking example locally

Contributing