Implement feature extractors from GESIS paper #52

maxxkia · 2017-09-01T16:42:56Z

Implement the features explained in the following paper
http://www.aclweb.org/anthology/W17-2907

Tokens, lemmas, PoS
Named Entities
Term filter, selecting lemmas with PoS=Noun, Verb, Adjective (idf-weighted)
Keyword terms, synonyms and hypernyms from TheSoz
Synonyms, hypernyms as well as derivational variants from WordNet

- added lemma Ngram feature extractor

- implementing wordnet feature extractor

implemented wordnet feature extractor

- implemented TheSoz feature extractor - added TheSoz feature extractor to TrainTestPipeline

- created tests for TheSozResource

- added more tests

- added more tests for conceptLabelLanguage

- refactored create query part

- added concept language in feature extractor and meta collector - used ngramMinN and ngramMaxN as parameters for extracting token ngrams - added ngram size parameters to TrainTestPipeline

- added stopwordRemover to the pipeline - added english stopwords file

- added caching feature to TheSozResource for more efficient access speed.

- updated bean definition for TheSoz

- fixed a bug with TheSoz feature extractor - wordnet feature extractor now uses the most frequent sense of each word

- created package eu.openminted.uc.socialsciences.io.pdf.pdfx and moved pdfx classes to this package - created package eu.openminted.uc.socialsciences.io.xml and moved xml conversion classes to this package - moved the Pipeline class to eu.openminted.uc.socialsciences.io.pdf package - updated documentation

- renamed Pipeline to PdfToXmiPipeline - updated documentation

- removed redundandt parameter PARAM_RESOURCE_NAME from TheSozMetaCollector

maxxkia added a commit that referenced this issue Sep 1, 2017

#52 - Implement feature extractors from GESIS paper

178422a

- added lemma Ngram feature extractor

maxxkia added this to the 2.0.0 milestone Sep 4, 2017

maxxkia added a commit that referenced this issue Sep 4, 2017

#52 - Implement feature extractors from GESIS paper

75eac5a

- implementing wordnet feature extractor

maxxkia added a commit that referenced this issue Sep 6, 2017

#52 - Implement feature extractors from GESIS paper

cf9ab3d

implemented wordnet feature extractor

maxxkia added a commit that referenced this issue Sep 18, 2017

#52 - Implement feature extractors from GESIS paper

d1d5d34

- implemented TheSoz feature extractor - added TheSoz feature extractor to TrainTestPipeline

maxxkia added a commit that referenced this issue Sep 18, 2017

#52 - Implement feature extractors from GESIS paper

d2f84bc

- created tests for TheSozResource

maxxkia added a commit that referenced this issue Sep 18, 2017

#52 - Implement feature extractors from GESIS paper

824d102

- added more tests

maxxkia added a commit that referenced this issue Sep 19, 2017

#52 - Implement feature extractors from GESIS paper

91da8f6

- added more tests for conceptLabelLanguage

maxxkia added a commit that referenced this issue Sep 19, 2017

#52 - Implement feature extractors from GESIS paper

a4034e4

- refactored create query part

maxxkia added a commit that referenced this issue Sep 19, 2017

#52 - Implement feature extractors from GESIS paper

4a4a154

- added stopwordRemover to the pipeline - added english stopwords file

maxxkia added a commit that referenced this issue Sep 19, 2017

#52 - Implement feature extractors from GESIS paper

8ac5fca

- added caching feature to TheSozResource for more efficient access speed.

maxxkia self-assigned this Sep 19, 2017

maxxkia added a commit that referenced this issue Sep 21, 2017

#52 - Implement feature extractors from GESIS paper

4d8faf9

- updated bean definition for TheSoz

maxxkia added a commit that referenced this issue Nov 8, 2017

#52 - Implement feature extractors from GESIS paper

1078947

- fixed a bug with TheSoz feature extractor - wordnet feature extractor now uses the most frequent sense of each word

maxxkia added a commit that referenced this issue Nov 9, 2017

#52 - Refactor ss-io-pdf module

75c208c

- renamed Pipeline to PdfToXmiPipeline - updated documentation

maxxkia added a commit that referenced this issue Nov 9, 2017

#52 - Implement feature extractors from GESIS paper

b209cdc

- removed redundandt parameter PARAM_RESOURCE_NAME from TheSozMetaCollector

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement feature extractors from GESIS paper #52

Implement feature extractors from GESIS paper #52

maxxkia commented Sep 1, 2017 •

edited

Loading

Implement feature extractors from GESIS paper #52

Implement feature extractors from GESIS paper #52

Comments

maxxkia commented Sep 1, 2017 • edited Loading

maxxkia commented Sep 1, 2017 •

edited

Loading