Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move away from Elasticsearch #16

Open
feanil opened this issue Feb 15, 2022 · 36 comments
Open

Move away from Elasticsearch #16

feanil opened this issue Feb 15, 2022 · 36 comments
Assignees
Labels
depr Proposal for deprecation & removal per OEP-21

Comments

@feanil
Copy link

feanil commented Feb 15, 2022

Due to the fact that AWS is no longer supporting the latest versions of Elasticsearch, we are considering deprecating our usage of ES in favor of the AWS replacement, OpenSearch.

This deprecation is in the initial stages of discovery, so we wanted to solicit community feedback before moving too far along on it, so there currently is no acceptance date for this deprecation ticket.

Discussion Thread: [https://discuss.openedx.org/t/deprecation-removal-depr-170-move-from-elasticsearch-to-opensearch/5844](Discussion Thread on Discuss)

Comment from Diana:

We here at edx.org are going to look into what it would take to remove all ES or OS dependencies and then evaluate from there.

@feanil feanil added the depr Proposal for deprecation & removal per OEP-21 label Feb 15, 2022
@dianakhuang
Copy link

Teams at edX/2U have done discovery on the work for this ticket, and we have decided to go forward with using OpenSearch for several key use cases. We will be removing usages of Elasticsearch or equivalent in favor of MySQL text search in all other use cases.

Switched to using OpenSearch:

  • cs_comments_service
  • blockstore
  • edx-search (courseware search)

Removing usage of Elasticsearch:

Because of this, I propose setting the acceptance date on this ticket for April 18, 2022 in order to give the community time to discuss this.

@jristau1984
Copy link

To be clear about Blockstore: Our investigation found that Blockstore does not currently leverage ElasticSearch at all. Our recommendation is to have the BD-14 Content Lib v2 project team implement another solution during the project which would remove the need for OpenSearch.

@jristau1984
Copy link

PT-CscommentsserviceESusage-170322-1900.pdf
PT-TNL-9545-ElasticsearchUsageandReplacement-170322-1900.pdf

These are the discoveries done by T&L and Infinity squads which led us to prefer an OpenSearch solution rather than a native MySQL solution.

@feanil
Copy link
Author

feanil commented Mar 18, 2022

@jristau1984 looking at the discoveries, it looks like courseware search of course content is not actually enabled in the new MFE. Is the reason for moving to opensearch that we're planning to port that feature to the MFE in the near future?

@jristau1984
Copy link

Yes, the plan is to re-implement this feature in the MFE versions of LMS and CMS when possible.

@ormsbee
Copy link

ormsbee commented Mar 22, 2022

@dianakhuang, @feanil: Can this be moved to "Communicated" status, since there is a post about it?

@dianakhuang dianakhuang moved this from Proposed to Communicated in DEPR: Deprecation & Removal Mar 22, 2022
@CodeWithEmad
Copy link
Member

@feanil since we know what are the exact index names,

PT-CscommentsserviceESusage-170322-1900.pdf
PT-TNL-9545-ElasticsearchUsageandReplacement-170322-1900.pdf

could we modify them to be configurable? like an environment variable with the default value of the index name (or add prefix/suffix to the index name). I'm asking this because we talked here about Using one Elastic Cluster for different organizations.

@feanil
Copy link
Author

feanil commented Apr 10, 2022

@CodeWithEmad I'm not sure exactly what you're asking? I believe that it should be possible to update the code to make the index names configurable in a safe way. Are you asking how you should go about modifying the code to be able to make this configurable?(Most Open edX services are django and have an associated settings file so I would push for making the name be pulled from a Django setting rather than an environment variable for consistency with the rest of the system.)

@feanil
Copy link
Author

feanil commented May 11, 2022

@dianakhuang I'm gonna assign this ticket to you as the point person for this work that 2U is taking on.

@jmbowman
Copy link

Arbi-BOM plans to take on some planning and coordination work for this very soon. For the benefit of them and any other 2U folk helping with this deprecation, here are some relevant internal resources:

I hope we can make most of the info in those docs public in the near future, but for now I just want to get the information linked so we can quickly unblock work on determining if there's anything useful we can do on this in time for the Olive release.

@jmbowman jmbowman moved this from Backlog to 2023 Q1 in Platform-Core Roadmap Dec 19, 2022
@UsamaSadiq UsamaSadiq moved this to In Progress in Arbi-BOM Feb 3, 2023
@UsamaSadiq
Copy link
Member

UsamaSadiq commented Feb 6, 2023

Created a Draft discussion document to discuss plan of action to lead the effort on this task. Once the plan of action has been finalised, subsequent issues to track the progress will be created and it will be shared publicly with other community members.

@feanil
Copy link
Author

feanil commented Feb 6, 2023

@UsamaSadiq why make the discussion about the plan of action internal? I think these decisions will impact a lot of people in the community and would benefit from being had in the open. Is there a specific concern that led you to making the discussion internal?

@UsamaSadiq
Copy link
Member

Hi @feanil, there is no particular reason. I was just taking it incrementally. I shared the document with 2U team first so we could do a final iteration/review before sharing with community.
I'll go ahead and update the permissions of the document to make it accessible to everyone around the community.
Also, I'm soon going to make subsequent issues which will make everything visible to the community as well.

@feanil
Copy link
Author

feanil commented Feb 7, 2023

Thanks @UsamaSadiq I think for such a big decision, it's good to share not only the final decision with the community but all the intermediate steps that led to the decision. Thanks for opening up the working docs.

@feanil
Copy link
Author

feanil commented Feb 10, 2023

@UsamaSadiq what do you think about writing ADRs for the decision for each repo so that we can share it out with the community? Since the decisions are different for the different projects, it would be good to capture the reasoning for each in the relevant repo.

@UsamaSadiq
Copy link
Member

Following is the current plan of action suggested by arbi-bom team to progress on this issue:

  • Issues related to ElasticSearch (ES) migration have been identified/created and will be shared with the owning teams/squads for each service. [See above linked issues created in the edx/upgrades repo]
  • The owning teams will review the issues and provide updates and estimates regarding the timeline for ES migration in their respective services.
  • Each team will create its own Architecture Decision Record (ADR) to document the decision regarding the migration in their respective repository.
  • arbi-bom team will be available for collaboration and assistance if needed by the teams.
  • These upgrade issues will be moved to Maintenance board to keep track of the progress and make them visible to the community as well.

CC: @jmbowman @feanil

@UsamaSadiq
Copy link
Member

Created issues on the Maintenance boards and notified the owning teams in their slack channels.

@UsamaSadiq UsamaSadiq moved this from In Progress to Owner Review in Arbi-BOM Feb 15, 2023
@feanil
Copy link
Author

feanil commented Feb 15, 2023

@UsamaSadiq my concern is that notifying the "owning" team at 2U does not inform the community of users or CCs for the repos, I'd like the communication plan to include those groups, what's the best way to include those here? I don't think it means that we have to block on feedback on those groups but I'd like them to be informed as we progress through the process. Most are not following projects in the edx org.

@UsamaSadiq
Copy link
Member

@feanil I've shared above mentioned issues with the owning teams. Each team will be creating an ADR document after finalising their findings and share it with community. Meanwhile, you can either let me know if I need to share the issues linked above in some particular openedx channel to make these more visible to the community or I could announce these issues to the community once we have initial ADR documents prepared by the owning teams.
I believe community will have access to the above linked issues so they'll be able to add their inputs on the issues. On 2u side, I'll keep on updating these issues with any update from the owning teams' side.

I hope this works out as you are expecting. If you have any other idea which could help us in increasing collaboration, I'm all ears to it.

@UsamaSadiq
Copy link
Member

Adding on to my point, we could probably create ADR documents in the openedx confluence and ask the 2U teams to add update there so it'll also be visible to the community and make the collaboration easier.

@feanil
Copy link
Author

feanil commented Feb 16, 2023

I think creating the drafts in the Open edX Confluence or as PRs on the repos(even in draft form) would both be great.

I think this ticket is a great place to provide future updates, but for major changes or milestones, I would also mention them on https://discuss.openedx.org/t/deprecation-removal-depr-170-move-from-elasticsearch-to-opensearch/5844/10

@dianakhuang dianakhuang removed their assignment May 4, 2023
@jmbowman
Copy link

Adding a note I wrote in a Slack conversation regarding a point that complicates the migration for course-discovery and edx-notes-api (I think these are the only repos that currently use django-elasticsearch-dsl):

Regarding OpenSearch libraries: there's https://github.com/opensearch-project/opensearch-py for basic Python support, but there's still only experimental, not-production-ready forks of https://github.com/django-es/django-elasticsearch-dsl and https://github.com/barseghyanartur/django-elasticsearch-dsl-drf (the former refused to add support for OpenSearch out of concern for API drift over time). The latest update is in barseghyanartur/django-elasticsearch-dsl-drf#271 (comment) (the mentioned forks have had no commits since that comment was made in December).

@dianakhuang
Copy link

AXIM is going to take over maintainership of the edx-notes-api repo, and will try to do this migration.

@feanil
Copy link
Author

feanil commented Nov 16, 2023

Open Questions

  • What version of Opensearch do we support?
  • What python library should we use for this? The opensearch one, or is there one that will let us work with both elasticsearch and opensearch?

@jmbowman
Copy link

Unfortunately, it looks like my comment from August still stands. There have been a couple of forks of Django's Elasticsearch packages to add/substitute OpenSearch, but they haven't seen any real activity since they were created last year. I suspect if we use them, we'll have to take over maintenance of them.

@dianakhuang
Copy link

Note: There were performance issues in the past with MySQL full text search and performing any other queries. We would like to make sure this is no longer the case before we implement it in our services.

@bradenmacdonald
Copy link

bradenmacdonald commented Feb 22, 2024

Hi folks, have there been any updates on OpenSearch/ElasticSearch/etc? Is there any current work happening?

My current understanding is:

  • edx-search is currently used for searching various things in the LMS (course content, library content, course about pages, teams)
    • It is designed as an abstract search provider but only has an ElasticSearch backend so far. Some work is still needed to make it more abstract.
    • If we implement an OpenSearch backend for edx-search, it would provide the lowest-effort way to allow the LMS search needs to use either ElasticSearch or OpenSearch.
    • If we also make its API more abstract, it could in theory support other search engine backends in the future (Manticore Search, TypeSense, Algolia, ...)
      • On the other hand, the ElasticSearch (and OpenSearch) API is not that difficult to use (based on personal experience using it directly for LabXchange), and if most Open edX users are going to be using a single solution (e.g. OpenSearch) I would almost prefer to just use the API directly instead of having this idiosyncratic abstraction layer (edx-search), which can be hard to understand.
    • It has some other weird technical debt, like it provides a search API that is hard-coded to search the courseware index only. That should be moved to the LMS, alongside the courseware index definition.
  • course-discovery and edx-notes-api do not use the edx-search abstraction layer, but are very tied to ElasticSearch via the use of django-elasticsearch-dsl and -drf, which don't officially support OpenSearch. (The OpenSearch-compatible forks have low activity and don't seem to be actively developed.)
    • per the discussion above, 2U would prefer to remove ElasticSearch from these altogether, rather than migrate them to OpenSearch or to use an abstraction layer like edx-search. It seems that using MySQL alone or other alternatives will provide sufficient performance for these use cases.
    • However, a later update says for edx-notes-api: "AXIM is going to take over maintainership of the edx-notes-api repo, and will try to do this migration."
    • Personally, I would lean toward either removing usage of ElasticSearch altogether, using the OpenSearch/ElasticSearch API directly, or migrating one or both of these use cases to edx-search as an abstraction layer (and then support multiple search engine backends that way). I think this is a better use of effort than migrating to the various services to unmaintained OpenSearch-compatible libraries and taking on their maintenance. However, I believe most Open edX installations don't use either of these services, so the number of stakeholders outside of 2U that may care which search engine is used here could be small.
  • cs_comments_service uses ElasticSearch via the official ruby gems. At some point you could use either OpenSearch or ElasticSearch with this service due to their API compatibility, but I don't know if that's still the case, and it may not be in the future. There is an OpenSearch fork of the main library. However the elasticsearch-model gem (part of elasticsearch-rails) doesn't have a fork, and it's unclear if the current version works with OpenSearch or not.
  • Regardless of whether ElasticSearch and OpenSearch are actually wire-compatible, recent versions of all the official ElasticSearch clients have been made to actively reject connections to OpenSearch, which is why you generally won't find client libraries that work with both ES+OS, and why there are OpenSearch forks of everything on the client side as well as the server side.
  • ES/OpenSearch is very resource intensive and hard to set up for multitenancy, so I know there is appetite among Open edX users in the community for other search engine options.

Note: I heavily updated this comment from the original version after further research ^

@jmbowman
Copy link

My info is about a month out of date now, but some historical context and opinions (Feanil has already heard most/all of this):

  • edx-search formerly used django-haystack as an abstraction layer across search engines, but that was ripped out after the package was abandoned upstream and it became an obstacle to upgrades and efficiently utilizing Elasticsearch (the abstraction layer imposed significant limits). The package is a little more actively maintained again now, but is still not in a very good state of repair and has a pretty stale set of supported backends (even OpenSearch isn't officially supported, and not working for everyone).
  • Other than django-haystack, there really isn't a multi-backend search solution for Django that I've been able to find.
  • As you noted, support for OpenSearch in Django and DRF is pretty weak and poorly maintained. And it has pretty much the same operational constraints as Elasticsearch (high overhead, etc.)
  • 2U basically gave up on using Elasticsearch for new search functionality and opted for Algolia instead, because it was so much easier to use. But depending on a proprietary service for an open source project isn't great and may even border on violating the AGPL license.
  • From a future-looking perspective, I feel that Meilisearch would be a better search engine to integrate with. It's MIT-licensed, blazing fast (implemented in Rust), much less resource-intensive than Elasticsearch, already fairly competitive with Algolia in many respects, has solid commercial support, and has pretty good Python support. There isn't an authoritative Django package for it yet, but there are several packages and blog posts outlining how other people have used them together. It would be a gamble, but frankly it feels like it has more momentum than OpenSearch.

I'm unfortunately not likely to be able to help much with this for a while, so it's going to be up to other people to pick a path forward. I just wanted to articulate that while OpenSearch looks at first like the easiest/safest path forward to solve the licensing problem, it's actually harder than it looks and may not really set up Open edX for success in future search improvements. I tried repeatedly over 3 years to build momentum on solving the Elasticsearch licensing issue, but it was hard to get anybody excited about the switch to OpenSearch (especially with 2U not feeling the pain because Amazon still hosts the old pre-license-change Elasticsearch version with security patches).

@bradenmacdonald
Copy link

Thanks a lot @jmbowman, that's very helpful.

@ormsbee
Copy link

ormsbee commented Feb 27, 2024

@feanil, @dianakhuang: Has Meilisearch been discussed/evaluated at any point in the ES replacement talks? I don't see any conversations on it in the wiki or Discourse. It sounds really compelling, particularly the part where it uses vastly less memory (a 5-10X difference from what I've seen of various people's blog posts).

@dianakhuang
Copy link

I know @jmbowman has advocated for it, but we haven't done any discovery on it.

@bradenmacdonald
Copy link

Meilisearch sounds like an ideal option to me too. And I like that it supports multitenancy, which can really bring down costs for orgs that host lots of small Open edX instances, e.g. sandboxes.

@jmbowman
Copy link

It's mostly been brought up in Slack threads and verbal conversations (mostly in the 2U internal workspace, although there are passing mentions here and here). In early conversations a couple of years ago it was still new/unproven enough that I wasn't confident promoting it as a serious alternative (didn't want to be the "rewrite it in Rust" fanboy), and there hasn't been much real discovery work done on this since then. The migration off Elasticsearch kept coming up in conversations, but those conversations usually ended with "well, it isn't a priority for 2U because it has the AWS-supported old Elasticsearch option, and nobody else in the community seems willing yet to commit resources to it or even answer how high of a priority it is for them." I do think Meilisearch has proven itself enough now that it should be seriously considered as an option, especially given the proven demand for Algolia-like functionality that isn't really covered by either Elasticsearch or OpenSearch.

@jristau1984
Copy link

The internal discussions I remember around getting off of ElasticSearch also mostly landed on "get off of the need for ES entirely, not just migrate to OpenSearch". Most of those came to fruition, I believe, with Discussions as a key item remaining in ES.

@ormsbee
Copy link

ormsbee commented Feb 27, 2024

I made a forum post on the topic of whether we should consider Meilisearch as a potential alternative to OpenSearch.

@feanil feanil changed the title Move from Elasticsearch to OpenSearch Move away from Elasticsearch Mar 13, 2024
@feanil
Copy link
Author

feanil commented May 2, 2024

Update, we'll be trying out Meilisearch for the new content library search and if we like we will choose it as the new target for all the existing search functionality. This determination will be made before Sumac is cut.

@feanil feanil moved this from Removing to Blocked in DEPR: Deprecation & Removal May 9, 2024
@iamsobanjaved iamsobanjaved removed this from Arbi-BOM Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
depr Proposal for deprecation & removal per OEP-21
Projects
Status: Blocked
Development

No branches or pull requests

8 participants