forked from andre-martins/TurboParser
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
28 changed files
with
68,678 additions
and
1,845 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,3 +26,10 @@ | |
* Introduced third-order parts (as described in our ACL 2013 short paper). | ||
* Windows-compatible (tested under MSVC; thanks to Afonso Mendes). | ||
* TurboTagger is now faster. | ||
|
||
2014-06-26 TurboParser 2.2.0 [email protected] | ||
New release with some additional features: | ||
* Implemented a Python wrapper using Cython. | ||
* Added a semantic parser, TurboSemanticParser (as described in our SemEval | ||
2014 paper) | ||
* Added a tokenizer for Portuguese. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
# Process this file with autoconf to produce a configure script. | ||
|
||
AC_PREREQ([2.59]) | ||
AC_INIT([TurboParser], [2.1.0], [[email protected]], [TurboParser], | ||
AC_INIT([TurboParser], [2.2.0], [[email protected]], [TurboParser], | ||
[http://www.ark.cs.cmu.edu/TurboParser/]) | ||
|
||
AM_INIT_AUTOMAKE([1.10 -Wall no-define]) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
To install the Python wrapper for Turbo Parser, please follow the instructions | ||
below. You need to have Cython (at least version 0.19) installed. To install | ||
Cython, type | ||
|
||
easy_install cython | ||
|
||
Then run: | ||
|
||
./install_wrapper.sh | ||
|
||
This should create a file turbo_parser.so in local subfolder build/<something> | ||
(e.g. build/lib.linux-x86_64-2.7). Create a symbolic link to this file in the | ||
python folder: | ||
|
||
ln -s build/lib.linux-x86_64-2.7/turboparser.so turboparser.so | ||
|
||
To test, open a Python shell and type: | ||
|
||
>>> import nlp_pipeline | ||
|
||
(You may need to add the library path first, e.g. | ||
|
||
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:<root folder>/deps/local/lib" | ||
|
||
where <root folder> is an absolute path to the folder where TurboParser is | ||
located.) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
#!/bin/bash | ||
|
||
# Root folder where TurboParser is located. | ||
root_folder="`cd $(dirname $0);cd ..;pwd`" | ||
|
||
# Lib folder. | ||
lib_folder=${root_folder}/libturboparser | ||
|
||
# Python folder. | ||
python_folder=${root_folder}/python | ||
|
||
# Make a static lib. | ||
cd $lib_folder | ||
make | ||
|
||
# Now use cython to build a Python wrapper. | ||
cd $python_folder | ||
python setup.py build_ext |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
import pdb | ||
|
||
class BasicLemmatizer: | ||
def __init__(self): | ||
self.lemmas = {} | ||
|
||
def load_lemmatizer_model(self, file_model): | ||
self.lemmas = {} | ||
f = open(file_model) | ||
for line in f: | ||
line = line.rstrip('\n') | ||
fields = line.split('\t') | ||
self.lemmas[(fields[0], fields[1])] = fields[2] | ||
f.close() | ||
|
||
def lemmatize_sentence(self, tokenized_sentence, tags): | ||
lemmas = [] | ||
for word, tag in zip(tokenized_sentence, tags): | ||
if (word, tag) in self.lemmas: | ||
lemma = self.lemmas[(word, tag)] | ||
else: | ||
lemma = word | ||
lemmas.append(lemma) | ||
return lemmas | ||
|
||
def lemmatize(self, file_test, file_prediction): | ||
f = open(file_test) | ||
f_out = open(file_prediction, 'w') | ||
for line in f: | ||
line = line.rstrip('\n') | ||
if line == '': | ||
f_out.write(line + '\n') | ||
continue | ||
elif line.startswith('#begin document'): | ||
f_out.write(line + '\n') | ||
continue | ||
elif line.startswith('#end document'): | ||
f_out.write(line + '\n') | ||
continue | ||
|
||
fields = line.split() | ||
word = fields[1] | ||
tag = fields[3] | ||
if (word, tag) in self.lemmas: | ||
lemma = self.lemmas[(word, tag)] | ||
else: | ||
lemma = word | ||
fields_out = fields | ||
fields_out[2] = lemma | ||
f_out.write('\t'.join(fields_out) + '\n') | ||
|
||
f.close() | ||
f_out.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.