-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preprocessing issues #1
Comments
Hi Michael, For preprocessing, we follow exactly the same as Miwa & Bansal's repo . The ace2json.py only converts their output to jason format. I'm not exactly sure what their pre-processing script does for the ``SR." issue. But this code should be fine for direct comparison with previous literature on ACE. For the missing files, could you give me an example which file you are missing and what error message did you get? Best, |
In the file:
That is because in |
I preprocess the ACE2005 corpus through your code but found some issues. The first issue is that in ace2json.py line93 the print function is in python2. I manually change the print function and run ace2json.py file. But it shows that I miss some files. I found that in ace2005/text/CNN_IP_20030408.1600.04.txt.conll the Standford annotator would wrongly add one more period in line 87~88 the word "SR..". there are two periods. Can you check this issue? Thanks for your contribution.
The text was updated successfully, but these errors were encountered: