Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token ArrayList for Match Info #191

Open
margaretha opened this issue Dec 9, 2024 · 5 comments
Open

Token ArrayList for Match Info #191

margaretha opened this issue Dec 9, 2024 · 5 comments

Comments

@margaretha
Copy link
Contributor

margaretha commented Dec 9, 2024

Match Info service returns token annotations as snippet in XML form. It would nice to have the token annotations also as an array.

For instance for the following snippet:

"snippet": "<span class=\"context-left\"><\/span><span class=
       \"match\"><span title=\"opennlp/p:PPER\">es<\/span> <span 
        title=\"opennlp/p:VAFIN\">war<\/span> <span title=\"opennlp/p:ART\">
        ein<\/span> <span title=\"opennlp/p:ADJA\">ärgerlicher<\/span> 
       <span title=\"opennlp/p:NN\">Anblick<\/span>; <span title=
        \"opennlp/p:ART\">die<\/span> <span title=\"opennlp/p:NN\">Fallbrücke
        <\/span> <span title=\"opennlp/p:VVFIN\">reichte<\/span> 
        <span title=\"opennlp/p:PTKNEG\">nicht<\/span> <span title=
        \"opennlp/p:APPR\">bis<\/span> <span title=\"opennlp/p:APPRART\">
        ans<\/span> <span title=\"opennlp/p:ADJA\">trockene<\/span> 
        ...

Suggestion:

{
    "tokens": [
        {"opennlp/p:PPER": "es"},
        {"opennlp/p:VAFIN": "war"},
        {"opennlp/p:ART": "ein"},
        {"opennlp/p:ADJA": "ärgerlicher"},
        {"opennlp/p:NN": "Anblick"},
        {"" : ";"}
        {"opennlp/p:ART": "die"},
        {"opennlp/p:NN": "Fallbrücke"},
        {"opennlp/p:VVFIN": "reichte"},
        {"opennlp/p:PTKNEG": "nicht"},
        {"opennlp/p:APPR": "bis"},
        {"opennlp/p:APPRART": "trockene"},
        {"opennlp/p:ADJA": "trockene"},
        ...
    ]
}
@Akron
Copy link
Member

Akron commented Dec 10, 2024

We could return koral:tokens. Your format has the problem that it only works for the chosen foundry/layer annotation, when there are no gaps at all. Also some token annotations have multiple supported annotations.

@Akron
Copy link
Member

Akron commented Dec 10, 2024

What is the use case?

@margaretha
Copy link
Contributor Author

@morckx Could you please explain the use case?

@margaretha
Copy link
Contributor Author

margaretha commented Dec 10, 2024

I have updated the description. Thanks for the hints, @Akron Nils. Is there a gap when e.g. a comma appears?

The keys and values should be switched. Is the following suggestion better?

Suggestion 2:

{
    "tokens": [
        {"es": ["opennlp/p:PPER"]},
        {"war": ["opennlp/p:VAFIN"]},
        {"ein": ["opennlp/p:ART"]},
        {"ärgerlicher": ["opennlp/p:ADJA"]},
        {"Anblick": ["opennlp/p:NN"]},
        {";" : []}
        {"die": ["opennlp/p:ART"]},
        {"Fallbrücke": ["opennlp/p:NN"]},
        {"reichte": ["opennlp/p:VVFIN"]},
        {"nicht": ["opennlp/p:PTKNEG"]},
        {"bis": ["opennlp/p:APPR"]},
        {"trockene": ["opennlp/p:APPRART", "opennlp/p:ADJA"]},
        ...
    ]
}

@Akron
Copy link
Member

Akron commented Dec 10, 2024

Yes, that is more flexible. In token lists only tokens appear, so it depends on if the comma was indexed as a token or not.
The array wouldn't be very efficient though, as single key maps could be costly, depending on the parsing software. But let's wait for the use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants