Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement feature extractors from GESIS paper #52

Open
4 of 5 tasks
maxxkia opened this issue Sep 1, 2017 · 0 comments
Open
4 of 5 tasks

Implement feature extractors from GESIS paper #52

maxxkia opened this issue Sep 1, 2017 · 0 comments
Assignees
Milestone

Comments

@maxxkia
Copy link
Member

maxxkia commented Sep 1, 2017

Implement the features explained in the following paper
http://www.aclweb.org/anthology/W17-2907

  • Tokens, lemmas, PoS
  • Named Entities
  • Term filter, selecting lemmas with PoS=Noun, Verb, Adjective (idf-weighted)
  • Keyword terms, synonyms and hypernyms from TheSoz
  • Synonyms, hypernyms as well as derivational variants from WordNet
maxxkia added a commit that referenced this issue Sep 1, 2017
- added lemma Ngram feature extractor
@maxxkia maxxkia added this to the 2.0.0 milestone Sep 4, 2017
maxxkia added a commit that referenced this issue Sep 4, 2017
- implementing wordnet feature extractor
maxxkia added a commit that referenced this issue Sep 6, 2017
implemented wordnet feature extractor
maxxkia added a commit that referenced this issue Sep 18, 2017
- implemented TheSoz feature extractor
- added TheSoz feature extractor to TrainTestPipeline
maxxkia added a commit that referenced this issue Sep 18, 2017
- created tests for TheSozResource
maxxkia added a commit that referenced this issue Sep 18, 2017
maxxkia added a commit that referenced this issue Sep 19, 2017
- added more tests for conceptLabelLanguage
maxxkia added a commit that referenced this issue Sep 19, 2017
maxxkia added a commit that referenced this issue Sep 19, 2017
- added concept language in feature extractor and meta collector
- used ngramMinN and ngramMaxN as parameters for extracting token ngrams
- added ngram size parameters to TrainTestPipeline
maxxkia added a commit that referenced this issue Sep 19, 2017
- added stopwordRemover to the pipeline
- added english stopwords file
maxxkia added a commit that referenced this issue Sep 19, 2017
- added caching feature to TheSozResource for more efficient access speed.
@maxxkia maxxkia self-assigned this Sep 19, 2017
maxxkia added a commit that referenced this issue Sep 21, 2017
- updated bean definition for TheSoz
maxxkia added a commit that referenced this issue Nov 8, 2017
- fixed a bug with TheSoz feature extractor
- wordnet feature extractor now uses the most frequent sense of each word
maxxkia added a commit that referenced this issue Nov 9, 2017
- created package eu.openminted.uc.socialsciences.io.pdf.pdfx and moved pdfx classes to this package
- created package eu.openminted.uc.socialsciences.io.xml and moved xml conversion classes to this package
- moved the Pipeline class to eu.openminted.uc.socialsciences.io.pdf package
- updated documentation
maxxkia added a commit that referenced this issue Nov 9, 2017
- renamed Pipeline to PdfToXmiPipeline
- updated documentation
maxxkia added a commit that referenced this issue Nov 9, 2017
- removed redundandt parameter PARAM_RESOURCE_NAME from TheSozMetaCollector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant