Replies: 1 comment 1 reply
-
Currently, it is not possible to pass information about tokens or other information to Spacy by standard gatenlp methods. TBH I am not too familiar with all the details of how to use Spacy with pre-annotated text, but of course there is the complication that gatenlp supports arbitrary nested and overlapping annotations while Spacy does not. I think it should be possible to implement something where gatenlp would pass an existing token sequence to spacy instead of the bare text, but it would be necessary to make sure that the token annotations cover the WHOLE document text, that all whitespace gets passed to spacy as well, and maybe that some gatenlp token features get converted to spacy token attributes? If you could give a more detailed description of what you are thinking of, maybe it is possible to come up with something that is flexible and generic enough to justify adding to gatenlp? Of course it is always possible to implement the missing pieces oneself in one's own Annotator and make use of existing methods or code from existing methods where necessary. For example if you have a specific way you want to pass on your tokens to spacy, you could implement the conversion from gatenlp to spacy yourself, then run the spacy pipeline and then maybe use gate.lib_spacy.spact2gatenlp(spacydoc, ...)` to convert back to a gatenlp document? I wonder, if you need to do a large part of your processing in Spacy, which part would you still need to do with gatenlp? |
Beta Was this translation helpful? Give feedback.
-
When running Spacy, from gate using the spacy_annotator, GATE nlp incorporates all the annotations produced by spacy into the GATE document. But:
Beta Was this translation helpful? Give feedback.
All reactions