-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with non-Hugging Face libraries #10
Comments
I just made a proposal for the new API structure. For now, it is structured as a Note that the current set-up only works for from disaggregators import Disaggregator
text = ["She, the woman, went to the park."]
disaggregator = Disaggregator("gender", language="en_core_web_md")
doc = disaggregator(text[0])
print(doc.spans["sc"])
print(doc.cats)
docs = disaggregator.pipe(text) new features:
things to consider:
|
Awesome, thank you so much for this! Jotting a couple thoughts down here:
I have some more thoughts which I'll write up soon – thanks again for doing this!!! 🤗 |
Hi, thanks for the input! I will re-write the code a bit during this week.
I assumed that we also wanted to optimize for speed, and reducing the dependency overhead so I opted for using spacy as a default tokenizer and pre-processor (not having to do this for each document for each potential module). For now, I will assume that flexibility and adaptability go over speed and efficiency.
I wanted to re-use this language param whenever possible while still allowing for using custom language configs per module.
This is a spacy specific thing for handling overlapping spans, which enables direct visualization with display.
I agree with this. However, I limited the generalizability due to wanting to wrap the modules within the spaCy eco-system, but as mentioned I will re-factor the code to allow for more flexibility. |
(As suggested in #8)
This issue may be split into multiple issues if needed.
Ideally the API should support these kinds of things out of the box, so this is a matter of verifying that they work and then documenting them, or making small changes to be compatible if needed. If it's absolutely necessary, we can consider making special methods to bridge things.
The text was updated successfully, but these errors were encountered: