-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move away from Elasticsearch #16
Comments
Teams at edX/2U have done discovery on the work for this ticket, and we have decided to go forward with using OpenSearch for several key use cases. We will be removing usages of Elasticsearch or equivalent in favor of MySQL text search in all other use cases. Switched to using OpenSearch:
Removing usage of Elasticsearch:
Because of this, I propose setting the acceptance date on this ticket for April 18, 2022 in order to give the community time to discuss this. |
To be clear about Blockstore: Our investigation found that Blockstore does not currently leverage ElasticSearch at all. Our recommendation is to have the BD-14 Content Lib v2 project team implement another solution during the project which would remove the need for OpenSearch. |
PT-CscommentsserviceESusage-170322-1900.pdf These are the discoveries done by T&L and Infinity squads which led us to prefer an OpenSearch solution rather than a native MySQL solution. |
@jristau1984 looking at the discoveries, it looks like courseware search of course content is not actually enabled in the new MFE. Is the reason for moving to opensearch that we're planning to port that feature to the MFE in the near future? |
Yes, the plan is to re-implement this feature in the MFE versions of LMS and CMS when possible. |
@dianakhuang, @feanil: Can this be moved to "Communicated" status, since there is a post about it? |
@feanil since we know what are the exact index names,
could we modify them to be configurable? like an environment variable with the default value of the index name (or add prefix/suffix to the index name). I'm asking this because we talked here about Using one Elastic Cluster for different organizations. |
@CodeWithEmad I'm not sure exactly what you're asking? I believe that it should be possible to update the code to make the index names configurable in a safe way. Are you asking how you should go about modifying the code to be able to make this configurable?(Most Open edX services are django and have an associated settings file so I would push for making the name be pulled from a Django setting rather than an environment variable for consistency with the rest of the system.) |
@dianakhuang I'm gonna assign this ticket to you as the point person for this work that 2U is taking on. |
Arbi-BOM plans to take on some planning and coordination work for this very soon. For the benefit of them and any other 2U folk helping with this deprecation, here are some relevant internal resources:
I hope we can make most of the info in those docs public in the near future, but for now I just want to get the information linked so we can quickly unblock work on determining if there's anything useful we can do on this in time for the Olive release. |
Created a Draft discussion document to discuss plan of action to lead the effort on this task. Once the plan of action has been finalised, subsequent issues to track the progress will be created and it will be shared publicly with other community members. |
@UsamaSadiq why make the discussion about the plan of action internal? I think these decisions will impact a lot of people in the community and would benefit from being had in the open. Is there a specific concern that led you to making the discussion internal? |
Hi @feanil, there is no particular reason. I was just taking it incrementally. I shared the document with 2U team first so we could do a final iteration/review before sharing with community. |
Thanks @UsamaSadiq I think for such a big decision, it's good to share not only the final decision with the community but all the intermediate steps that led to the decision. Thanks for opening up the working docs. |
@UsamaSadiq what do you think about writing ADRs for the decision for each repo so that we can share it out with the community? Since the decisions are different for the different projects, it would be good to capture the reasoning for each in the relevant repo. |
Following is the current plan of action suggested by arbi-bom team to progress on this issue:
|
Created issues on the Maintenance boards and notified the owning teams in their slack channels. |
@UsamaSadiq my concern is that notifying the "owning" team at 2U does not inform the community of users or CCs for the repos, I'd like the communication plan to include those groups, what's the best way to include those here? I don't think it means that we have to block on feedback on those groups but I'd like them to be informed as we progress through the process. Most are not following projects in the |
@feanil I've shared above mentioned issues with the owning teams. Each team will be creating an ADR document after finalising their findings and share it with community. Meanwhile, you can either let me know if I need to share the issues linked above in some particular openedx channel to make these more visible to the community or I could announce these issues to the community once we have initial ADR documents prepared by the owning teams. I hope this works out as you are expecting. If you have any other idea which could help us in increasing collaboration, I'm all ears to it. |
Adding on to my point, we could probably create ADR documents in the openedx confluence and ask the 2U teams to add update there so it'll also be visible to the community and make the collaboration easier. |
I think creating the drafts in the Open edX Confluence or as PRs on the repos(even in draft form) would both be great. I think this ticket is a great place to provide future updates, but for major changes or milestones, I would also mention them on https://discuss.openedx.org/t/deprecation-removal-depr-170-move-from-elasticsearch-to-opensearch/5844/10 |
Adding a note I wrote in a Slack conversation regarding a point that complicates the migration for course-discovery and edx-notes-api (I think these are the only repos that currently use django-elasticsearch-dsl):
|
AXIM is going to take over maintainership of the edx-notes-api repo, and will try to do this migration. |
Open Questions
|
Unfortunately, it looks like my comment from August still stands. There have been a couple of forks of Django's Elasticsearch packages to add/substitute OpenSearch, but they haven't seen any real activity since they were created last year. I suspect if we use them, we'll have to take over maintenance of them. |
Note: There were performance issues in the past with MySQL full text search and performing any other queries. We would like to make sure this is no longer the case before we implement it in our services. |
Hi folks, have there been any updates on OpenSearch/ElasticSearch/etc? Is there any current work happening? My current understanding is:
Note: I heavily updated this comment from the original version after further research ^ |
My info is about a month out of date now, but some historical context and opinions (Feanil has already heard most/all of this):
I'm unfortunately not likely to be able to help much with this for a while, so it's going to be up to other people to pick a path forward. I just wanted to articulate that while OpenSearch looks at first like the easiest/safest path forward to solve the licensing problem, it's actually harder than it looks and may not really set up Open edX for success in future search improvements. I tried repeatedly over 3 years to build momentum on solving the Elasticsearch licensing issue, but it was hard to get anybody excited about the switch to OpenSearch (especially with 2U not feeling the pain because Amazon still hosts the old pre-license-change Elasticsearch version with security patches). |
Thanks a lot @jmbowman, that's very helpful. |
@feanil, @dianakhuang: Has Meilisearch been discussed/evaluated at any point in the ES replacement talks? I don't see any conversations on it in the wiki or Discourse. It sounds really compelling, particularly the part where it uses vastly less memory (a 5-10X difference from what I've seen of various people's blog posts). |
I know @jmbowman has advocated for it, but we haven't done any discovery on it. |
Meilisearch sounds like an ideal option to me too. And I like that it supports multitenancy, which can really bring down costs for orgs that host lots of small Open edX instances, e.g. sandboxes. |
It's mostly been brought up in Slack threads and verbal conversations (mostly in the 2U internal workspace, although there are passing mentions here and here). In early conversations a couple of years ago it was still new/unproven enough that I wasn't confident promoting it as a serious alternative (didn't want to be the "rewrite it in Rust" fanboy), and there hasn't been much real discovery work done on this since then. The migration off Elasticsearch kept coming up in conversations, but those conversations usually ended with "well, it isn't a priority for 2U because it has the AWS-supported old Elasticsearch option, and nobody else in the community seems willing yet to commit resources to it or even answer how high of a priority it is for them." I do think Meilisearch has proven itself enough now that it should be seriously considered as an option, especially given the proven demand for Algolia-like functionality that isn't really covered by either Elasticsearch or OpenSearch. |
The internal discussions I remember around getting off of ElasticSearch also mostly landed on "get off of the need for ES entirely, not just migrate to OpenSearch". Most of those came to fruition, I believe, with Discussions as a key item remaining in ES. |
I made a forum post on the topic of whether we should consider Meilisearch as a potential alternative to OpenSearch. |
Update, we'll be trying out Meilisearch for the new content library search and if we like we will choose it as the new target for all the existing search functionality. This determination will be made before Sumac is cut. |
Due to the fact that AWS is no longer supporting the latest versions of Elasticsearch, we are considering deprecating our usage of ES in favor of the AWS replacement, OpenSearch.
This deprecation is in the initial stages of discovery, so we wanted to solicit community feedback before moving too far along on it, so there currently is no acceptance date for this deprecation ticket.
Discussion Thread: [https://discuss.openedx.org/t/deprecation-removal-depr-170-move-from-elasticsearch-to-opensearch/5844](Discussion Thread on Discuss)
Comment from Diana:
The text was updated successfully, but these errors were encountered: