-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standoff property value retrieving performance #1578
Comments
Have you read https://discuss.dasch.swiss/t/large-texts-and-xml-databases/134 ? You have two options:
|
I suggested the same thing to you last April: |
no, sorry, no more enough motivations and time to follow your next devs, I just try to find solutions with the existing Knora :)
yep, I remember, with @mrivoal we thought about that, but not so easy for us to automatically split our user's data during the migration process from their mysql db into Knora. And anyway, at the end of the day, they probably won't want to split their data :| Just have a look to their job : http://lumieres.unil.ch/fiches/trans/1088/ , in the edit mode, you need an account for that, they use ckeditor which produces a kind of pseudo-html, we provided a standoff mapping and it works very well, it's a shame that (probably) just for few transcriptions we have this kind of low perfs :( |
The test case, if you want to reproduce : PerfTrans.zip |
@mrivoal The only solution I see right now is to ask them to split their existing transcriptions in their database before our final migration. @benjamingeer the save process is also very slow, it is not a problem for our migration process but probably a problem in our web app client if the end user have to wait more than 30 sec to save smthing... they didn't give us feedbacks about that but they probably will in a near future ! |
If you can split the text into smaller pieces, both saving and loading will be faster. |
Yes, the modeling solution, as usual. Then I guess, for the long run, Knora will have to store long texts in XML databases. |
It's a trade-off. If you can store texts in small enough pieces (1000 words is a good size if you have a lot of markup), you can store them as RDF, and get functionality that you wouldn't get by storing the text in eXist-db, like "find me a text that mentions a person who was born after 1720 and who was a student of Euler". (Maybe you could do that in eXist-db if you were willing to store all your data as XML.) Otherwise, you can store the text in eXist-db: storage and retrieval will be faster, and some queries will be faster, but you will lose some search capabilities. I think the best we can do is offer both options, and let each project decide which is best for them. |
What do you consider will be "a lot of markup"? |
In the test I did, nearly every word had a tag. The more markup you have, the more triples have to be retrieved, and the slower it's going to be. If you have a big text with very little markup, GraphDB can still retrieve it pretty quickly. |
Ok, thanks. |
That text has chapters. Why not store one chapter per resource? That would also make navigation and editing a lot easier. Do you really want to scroll through that much text on one HTML page? |
We have in our project huge transcriptions based on our own standoff mapping and we have poor performances to retrieve the value of this property. I mainly use gravsearch to retrieve data, but even with the
v2/resources
we have poor perfs.We are talking here about 20 or 30 seconds to retrieve this resource.
If needed @loicjaouen will provide to you our Mem/CPU stack config
I'm going to prepare a test case to let you try to reproduce this perf problem on your side.
The text was updated successfully, but these errors were encountered: