Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull upstream changes #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: CI

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Java
uses: actions/setup-java@v1
with:
java-version: 11
- name: Build with Maven
run: mvn clean install
16 changes: 15 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,18 @@ bin/*
build/*
ci/*
dist/*
/bin/

### IntelliJ ###
.idea/
*.iml

### Maven ###
target/
pom.xml.tag
pom.xml.releaseBackup
pom.xml.versionsBackup
pom.xml.next
release.properties
dependency-reduced-pom.xml
buildNumber.properties
.mvn/timing.properties
6 changes: 6 additions & 0 deletions .solr_wrapper
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Place any default configuration for solr_wrapper here
collection:
dir: example/solr_configs/
name: test
version: 6.6.1
port: 8983
13 changes: 0 additions & 13 deletions .travis.yml

This file was deleted.

4 changes: 4 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
source 'https://rubygems.org'

gem 'rake'
gem 'solr_wrapper'
25 changes: 25 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
GEM
remote: https://rubygems.org/
specs:
faraday (0.15.2)
multipart-post (>= 1.2, < 3)
multipart-post (2.0.0)
rake (12.3.1)
retriable (3.1.2)
ruby-progressbar (1.10.0)
rubyzip (1.2.1)
solr_wrapper (2.0.0)
faraday
retriable
ruby-progressbar
rubyzip

PLATFORMS
ruby

DEPENDENCIES
rake
solr_wrapper

BUNDLED WITH
1.16.2
14 changes: 11 additions & 3 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
Copyright (c) 2013. The Board of Trustees of the Leland Stanford Junior University. All rights reserved.
Copyright 2018 The Board of Trustees of the Leland Stanford Junior University

Redistribution and use of this distribution in source and binary forms, with or without modification, are permitted provided that: The above copyright notice and this permission notice appear in all copies and supporting documentation; The name, identifiers, and trademarks of The Board of Trustees of the Leland Stanford Junior University are not used in advertising or publicity without the express prior written permission of The Board of Trustees of the Leland Stanford Junior University; Recipients acknowledge that this distribution is made available as a research courtesy, "as is", potentially with defects, without any obligation on the part of The Board of Trustees of the Leland Stanford Junior University to provide support, services, or repair;
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, WITH REGARD TO THIS SOFTWARE, INCLUDING WITHOUT LIMITATION ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, AND IN NO EVENT SHALL THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, TORT (INCLUDING NEGLIGENCE) OR STRICT LIABILITY, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
83 changes: 83 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# CJKFilterUtils

[![Build Status](https://travis-ci.org/sul-dlss/CJKFilterUtils.svg?branch=master)](https://travis-ci.org/sul-dlss/CJKFilterUtils) [![codecov](https://codecov.io/gh/sul-dlss/CJKFilterUtils/branch/master/graph/badge.svg)](https://codecov.io/gh/sul-dlss/CJKFilterUtils)


This is a Lucene filter and filter factory (see http://lucene.apache.org )
to fold certain CJK characters to improve recall. You should put it in your
analysis chain BEFORE ICUTransforms from Traditional->Simplified Han, as it
converts modern Japanese Kanji to their traditional equivalents.

## Usage

- clone the project

git clone git://github.com/solrmarc/CJKFilterUtils.git

- run the maven installation

mvn clean install

- put the `CJKFilterUtils*.jar` file found in the target directory into your Solr lib directory
- utilize the Solr CJKFoldingFilterFactory in your schema.xml file.

<fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<charFilter class="edu.stanford.lucene.analysis.ICUTransformCharFilterFactory" id="Traditional-Simplified" />
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="edu.stanford.lucene.analysis.CJKFoldingFilterFactory"/>
<charFilter class="edu.stanford.lucene.analysis.ICUCustomTransformCharFilterFactory" id="edu/stanford/lucene/analysis/stanford_cjk_transliterations.txt" />
<filter class="solr.ICUTransformFilterFactory" id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
</analyzer>
</fieldType>

## Checking example locally

(Uses Ruby)

Install Ruby dependencies

```sh
$ bundle install
```

Setup Solr with CJKFilterUtils and config/schema

```sh
$ bundle exec rake setup_server
```

Run solr_wrapper

```sh
$ solr_wrapper
```

In another shell, index fixtures

```sh
$ bundle exec rake fixtures
```

Run some queries (these should return results):

```sh
$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:呂思勉两晋南北朝&wt=json

$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:俞平伯红楼梦&wt=json

$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:南洋&wt=json

```

## Contributing

1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Added some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request
45 changes: 0 additions & 45 deletions README.rdoc

This file was deleted.

21 changes: 21 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
require 'solr_wrapper'

desc 'Setup a local Solr using the build .jar for testing'
task :setup_server do
`mvn clean package`
version = `unzip -q -c target/CJKFilterUtils-*.jar META-INF/maven/edu.stanford/CJKFilterUtils/pom.properties`
.split("\n").select{ |s| s =~ /^version/ }.first.split('=').last
SolrWrapper.wrap do |solr|
FileUtils.cp(
File.join(__dir__, 'target', "CJKFilterUtils-#{version}.jar"),
File.join(solr.instance_dir, 'contrib')
)
solr.with_collection(name: 'test') do
end
end
end

task :fixtures do
system 'curl -X POST -H "Content-Type: application/json" "http://localhost:8983/solr/test/update/" --data-binary @example/fixtures.json'
system 'curl http://localhost:8983/solr/test/update?stream.body=%3Ccommit/%3E'
end
19 changes: 0 additions & 19 deletions build.properties

This file was deleted.

Loading
Loading