Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange output format for phrase extraction. #15

Open
johnb30 opened this issue Jun 7, 2016 · 2 comments
Open

Strange output format for phrase extraction. #15

johnb30 opened this issue Jun 7, 2016 · 2 comments
Milestone

Comments

@johnb30
Copy link
Member

johnb30 commented Jun 7, 2016

Trying to run some sample data to explore the phrase extraction pieces. I'm using the following data:

{'abc123': {'meta': {'date': '20010101'},
  'sents': {0: {'content': u'At least 37 people are dead after Islamist radical group Boko Haram assaulted a town in northeastern Nigeria .',
    'parsed': u'(ROOT (S (NP (QP (IN AT ) (JJS LEAST ) (CD 37 ) ) (NNS PEOPLE ) ) (VP (VBP ARE ) (ADJP (JJ DEAD ) ) (SBAR (IN AFTER ) (S (NP (JJ ISLAMIST ) (JJ RADICAL ) (NN GROUP ) (NNP BOKO ) (NNP HARAM ) ) (VP (VBD ASSAULTED ) (NP (NP (DT A ) (NN TOWN ) ) (PP (IN IN ) (NP (JJ NORTHEASTERN ) (NNP NIGERIA ) ) ) ) ) ) ) ) (. . ) ) )'}}}}

I then run it through the do_coding routine:

event_dict_updated = petrarch2.do_coding(event_dict, None)

Which yields the following updated dictionary:

{'abc123': {'meta': {'date': '20010101',
   u'verbs': {u'nouns': [([u' PEOPLE'], [u'~PPL'], [[u'~']]),
     ([u' ISLAMIST', u' BOKO HARAM'],
      [u'NGAREBMUS'],
      [[u'~'], (u'NGAREB', [])]),
     ([u' NIGERIA'], [u'NGA'], [(u'NGA', [])])]}},
  'sents': {0: {'content': u'At least 37 people are dead after Islamist radical group Boko Haram assaulted a town in northeastern Nigeria .',
    'parsed': u'(ROOT (S (NP (QP (IN AT ) (JJS LEAST ) (CD 37 ) ) (NNS PEOPLE ) ) (VP (VBP ARE ) (ADJP (JJ DEAD ) ) (SBAR (IN AFTER ) (S (NP (JJ ISLAMIST ) (JJ RADICAL ) (NN GROUP ) (NNP BOKO ) (NNP HARAM ) ) (VP (VBD ASSAULTED ) (NP (NP (DT A ) (NN TOWN ) ) (PP (IN IN ) (NP (JJ NORTHEASTERN ) (NNP NIGERIA ) ) ) ) ) ) ) ) (. . ) ) )'}}}}

There are a couple issues here:

  1. The nested meta, verbs, nouns construct is incorrect.
  2. It's unclear what, exactly, is associated with what. For example, it isn't clear what the [[u'~'], (u'NGAREB', [])]) construct refers to in the sentence.

This isn't relevant to this issue, but it should also be noted that this sentence doesn't code an event even though PETR is clearly identifying potential source and target actors and "assaulted" should be a relevant verb.

cc @philip-schrodt @ahalterman

@johnb30
Copy link
Member Author

johnb30 commented Jun 14, 2016

It should also probably be noted that things like:

{(u'---COPLEG', u'---GOV', u'041'): [[u'CALLED'], [u'HAS']],
 u'actorroot': {(u'---COPLEG', u'---GOV', u'041'): [u'', u'']},
 u'actortext': {(u'---COPLEG', u'---GOV', u'041'): [u'deputy ... Congress',
   u'Governor']},
 u'eventtext': {(u'---COPLEG', u'---GOV', u'041'): u'has called'},
 u'nouns': [([u' CONGRESS'], [u'~LEG'], [[u'~']]),
  ([u' DEPUTY', u' CONGRESS'], [u'~COPLEG'], [[u'~'], [u'~']]),
  ([u' GOVERNOR'], [u'~GOV'], [[u'~']]),
  ([u' ADMINISTRATION'], [u'~GOV'], [[u'~']])]}

cause hell if you're trying to dump that result out to JSON since the keys, e.g., (u'---COPLEG', u'---GOV', u'041') aren't a hashable type.

@johnb30 johnb30 added this to the 2.0.0 milestone Jul 20, 2016
@johnb30
Copy link
Member Author

johnb30 commented Nov 14, 2016

Re-upping this since I discovered it again. The tuples as keys thing needs to be fixed ASAP since it's breaking hypnos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant