-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr as "sidecar" rather than as primary store? #30
Comments
Hi Nicolas, Your not the only one to suggest this (cc @eefahy) also has similar concerns. I'm happy to consider changing this but would like to understand the issues with using SOLR as backend storage. Are there any issues you are aware of or experience which you've had with using SOLR which would impact the current approach? The downside of having SOLR as a sidecar is the added complexity and keeping everything in sync. I have considered having maybe a file based storage as the primary storage mechanism for annotations and then using JMS messaging to be more generic on possible side cars but when should the annotation be considered to be 'accepted' by the annotation store, when it passes the file level store or when its successfully gets through all of the side cars? If its only the 'main/safe' annotation store then there is a risk that the annotation may fail when it gets to SOLR and by then control has already returned to the user so they won't be informed of the failure. If you wait until all interested parties have successfully processed the annotation you're potentially adding a delay to the storage of the anno. I would also have to add functionality for re-indexing in case the two storage mechanisms get out of sync. As mentioned above I'm happy to look at moving to the SOLR side car approach but want to ensure it brings enough functionality (and reassurance to users) to justify the added complexity it will bring. Thanks Glen |
Indeed, it would require much complexity ensuring that both stores (backend and search) are synchronized at the same time. But I think it should first be stored safely, reported to the user, and then send to the sidecar(s) asynchronously. Search functionality can wait, safe storage comes first. Using a incremental version number would for example help reminding which annotation to index/delete. Reason why Solr is less "safe" is because it looks less "clean" to me:
I know, it's a matter of taste ;-) |
I've been thinking about this more and in particular have come across issues with installing SOLR on AWS and keeping the data safe between restarts. I think to solve both these issues I am going to:
The activity stream should also allow a re-sync of ElasticSearch if it gets behind or corrupted. The internal Java Notification System would be in a different thread to the user saving an annotation so they won't be held up. Comments welcome! |
A few comments:
|
Wouldn't it be better to use a store like Solr as an extra store, rather than as a primary store?
Solr is good for search, not for backend storage.
Maybe add a "sidecar store" functionality:
The text was updated successfully, but these errors were encountered: