-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype Typesense search support #245
Comments
FYI: @jmakowski1123, @kdmccormick |
I'd also like to know more about the root concern here: is it concern that learners and authors won't have access to search features (or e.g. the content libraries UI) during occasional downtime of a non-replicated search engine? And/or is it concern that writes (updates to the search index) will be lost during such downtime? In other words, if we had a feature for queueing writes so they weren't ever lost, even if the search engine was occasionally unavailable, would that be "good enough"? |
This isn't just an issue for MIT. For any system expected to handle a high number of concurrent users, it's essential to have more than one node available to service requests. This is crucial for horizontal scaling and fault tolerance. While vertical scaling can be an option, horizontal scaling is more manageable and flexible, especially with variable loads. There is no reason to complicate this with abstractions that allow for customizable search backends. We should choose a search backend that satisfies most requirements The platform could tolerate using meilisearch if it's used only for this particular use case -- search content library -- but if it eventually replaces search in Open edX, meilisearch is not a solution we would go with. |
My assumption has always been that search is neither an essential nor commonly used feature in Open edX. i.e. as a learner, you can log in, access your course, learn, complete exams, use the forum, view your grades, etc. whether or not the search engine is up. Search downtime degrades the experience somewhat but isn't on the critical path. (Content libraries is actually the exception, as the entire UI depends on Meilisearch but it can easily be put on its own Meilisearch instance, and author traffic is orders of magnitude less than learner traffic in most cases.) Of course, we'd like to change that and see search improved and made more reliable. What I can tell you is as a developer the feature side of things is much easier to improve using Meilisearch than any other open source product we've looked at. And from a reliability perspective we know there is a lot of community interest in getting Meilisearch HA soon. Meilisearch is one of the most quickly-developed open source search engines in terms of regularly releasing new versions with new features and better performance. In fact, apparently Meilisearch can be run in a cluster, but it doesn't yet have a "nice" way to do this out of the box. You have to turn on the I believe their cloud offering has something like this in place. Personally, my preferred actions would be: Would MIT be willing to help with (1), and if so what would be required? (Ability to toggle back and forth between Elasticsearch and Meilisearch? Some clustering/HA functionality before you're even willing to test?) |
I think that your first statement is rather telling and points to a symptom of the Open edX project ecosystem that I have seen over the years; namely a tendency to build a point solution to a problem that would be more effectively addressed in a broader and more holistic manner. One of the contributing factors to that tendency is the fact that the Open edX suite of software has been developed over several years by a large and constantly changing set of developers with competing priorities. The fact that you are unsure of whether and how search is essential to the edX functionality is a problem that we, as a community, need to address first before we decide on any solutions. Choosing a technology (whether Meilisearch, Typesense, or any other option) to "do search" without fully knowing what problems and capabilities we are trying to solve for is a recipe for long-term pain. In terms of the Meilisearch specifically, the fact of the matter is that the issue asking about HA functionality has been open for almost 2 years now with no meaningful progress. The fact that there is an experimental flag with no associated documentation or best practices doesn't change that fact. From what I can determine based on the scant information available, the operator is expected to build their own replication and consensus implementation. That is an extremely non-trivial undertaking and would be an unreasonable expectation of any Open edX operator. This is effectively the same as suggesting that we use SQLite for the relational database. While it's possible to build a replication and fail-over mechanism, it's not part of the design of the engine. And while SQLite is a great technology with many useful applications, it is not something that is suited to the use case of a project like Open edX. TL;DR is that we need to determine what are the applications for search across the Open edX suite of services and how can we best address those needs with a single core technology that can be integrated across the various processes that comprise a running Open edX system. |
I don't disagree with anything you're saying. As you know, our more or less official plan was to roll out Meilisearch in Redwood and get feedback:
However we received very little feedback, other than from developers who loved working with it, several enthusiastic community members who wanted to start applying Meilisearch everywhere, one major hosting provider who said they're "not that concerned about the HA problem, We don’t actually deploy that many ES clusters, even for large instances, mostly because search isn’t really a critical path in the overall experience so downtime there isn’t as terrible as it could be for Redis for example", and you who said the lack of HA was at least a significant concern if not a showstopper. For Sumac we developed one additional new feature (content libraries v2 UI) that depends on Meilisearch, and we again hoped to get feedback from operators about this. Now for context, it's important to note at this juncture that for the overwhelming majority of Open edX installations, the high memory usage of ElasticSearch is a very big, persistent, and real pain point, and HA is not a major concern. In addition, Elasticsearch and Opensearch are continuing to diverge and their API differences and licensing issues can be an issue. This is why the Tutor maintainers and many others became very excited about Meilisearch. In particular, shortly before the Sumac release, the Tutor maintainers decided to just pull the trigger on something they've said they wanted to do for a while: they implemented Meilisearch support as an alternative to ES everywhere it's used in the core distribution, removed ES support from Tutor, and implemented Meilisearch support. I think this is a good decision in terms of benefiting most Tutor users, but it was definitely "jumping the gun" in terms of our plan for a more considered and incremental rollout, after hearing feedback from production use. We're still at a point where both ES and Meilisearch are officially supported by most of the search features in the platform (other than the two new ones I mentioned), and now is still a good time to evaluate TypeSense as an alternative and do a holistic evaluation of "search" requirements etc. I know @blarghmatey you've offered to help with this in the past and I'd really appreciate any work you're able to invest in this, because as you've seen most of the community seems either indifferent or happy with Meilisearch, so I don't expect a big line of other volunteers ready to work on such evaluations. TL;DR if we want to do a holistic evaluation of search use cases and/or test out Meilisearch/TypeSense/Algolia? on actual large instance data sets, we're already starting to go against a bit of a headwind so we'll need one or two big players like MIT, Axim, 2U, etc. to make it a priority and put some resources into making it happen ASAP. I'm supportive of such an effort and willing to help. |
Hi folks! Thank you for pushing this conversation forward. 😄 I'd like to level set on a few things: The importance of search functionality.We should consider search to be critical infrastructure. It's already an important part of the student experience and a critical part of the content library experience. Per @jmakowski1123, we're only going to see more critical usage of it going forward in the student experience, so we should position our technical choices accordingly. Point solutions vs. more holistic ones.I agree that the project has historically had a lot of point solutions that have been generalized after the fact. That's one of the reasons I tried to engage folks in things like the Discourse thread on this topic, to try to get input from others. We had a tentative direction in March to use Meilisearch, with renewed discussions around HA concerns and Typesense as an alternative starting in August. I stated in that thread that it was likely too late to change things for Sumac, but that we would evaluate after Sumac was released. Sumac was released. We're now re-evaluating and working towards that more holistic solution. If that means we eventually land on Typesense, then so be it. We just need to start planning for this work. FundingAxim can fund the development work to evaluate Typesense and do necessary development work after that. What we can't easily do is test at scale, but it seems like MIT is in a good place to be able to provide that. I think we have the right people and resources to start planning what that will look like in the Teak timeframe. |
In order to move this forward, I would like to request the following from you folks: @bradenmacdonald: Much of the description for this ticket was based on stuff that you said or wrote in the past. Please add any more details and line items that you think are relevant to this work. @blarghmatey: I know that you've previously mentioned that you planned to put aside some time to help with this evaluation. Could you please provide a description of what your evaluation will require? In particular, do you want to focus on the forums backend first, as it's the one that likely comes under the most load? Or both forums and course content indexing? Something else? Synthetic load testing, or trialing it with some live traffic? We have two sets of work here that are both important to capture, but I don't want to conflate the two:
@blarghmatey: Your input here is going to be critical with item (1). We're going to want to implement enough to do the Typesense evaluation as quickly as possible, so that we can decide whether or not we need to queue up the rest of the work in the Teak timeframe. Thank you! |
I believe that forum search and learner courseware search are by far the use cases that will place the most load on the search engine, take the longest to index, and also require the most uptime. Conveniently, they are also already abstracted somewhat (you can use Elasticsearch or Meilisearch). And they also make only rudimentary use of each search engine's features (unlike, say, the complicated hierarchical tag search in Studio). Indeed, we saw how quickly the Meilisearch backend for them was developed just prior to the Sumac launch. So my recommendation would be not to develop any new abstraction layer just yet nor worry about the more complex use cases, but to implement a minimal TypeSense backend for these two use cases, alongside the existing ElasticSearch+Meilisearch backends, and then rigorously test all three with production data: indexing speed1, search performance, resource usage, and (if at all possible) uptime under real world conditions. Keep in mind that others have done some of these tests; for example here you can see that for the HackerNews dataset with 1.1M documents, Meilisearch easily outperforms TypeSense despite TypeSense being an "in-memory" database. And the TypeSense test only runs on a machine with 100GB of RAM2, whereas the 1GB RAM machine can run the same test on Meilisearch. That test was using quite an old version of Meilisearch though; I suspect the new versions are even better. So I'm mostly interested in learning about the indexing performance (/errors), and uptime. It would also be great if someone could repeat the same tests with Meilisearch cloud and TypeSense cloud. After all, we are used to paying AWS for ElasticSearch, Atlas for MongoDB, and Algolia for Algolia for large instances that need HA clusters, so it makes sense to try the equivalent cloud offering for Meilisearch and TypeSense. At the same time, and as a separate GitHub ticket from this one, I support @blarghmatey's proposal for doing an updated, comprehensive look at all the search use cases in the platform; something like this one from 7 years ago. We can also do a feature matrix of the different search engines that are under consideration but I can already tell you there is no search engine that will meet all the requirements we've already identified. This page provides a pretty fair comparison of Meilisearch and TypeSense at a feature level, though it downplays both the importance of HA and the problems people have had trying to scale TypeSense to large datasets. Footnotes
|
My main concern is that I don't want us to end up in a situation where we have multiple different search engines that are all used differently and that all have different levels of support/use in the Open edX ecosystem. We are already partially in that situation with the existing usage of Elasticsearch and the partial usage of Meilisearch. I know that there will never be a search engine that does everything everyone could possibly want, but that doesn't mean that we should just keep adding new engines without a long-term plan, or cohesive requirements. ElasticSearch/OpenSearch is certainly a heavyweight process for small-scale usage, but it is an extremely battle-tested and well understood technology. Going from a feature matrix of search engines is taking the problem from the wrong end. What is more effective is to write the list of use cases, identifying which are must have vs. nice to have and then decide from there what technology/architecture can best address those needs. For example, is the ability to have the JS frontend directly interface with the search engine a "must have" requirement, or a "nice to have" feature? Most critically, the decision of "must have" vs "nice to have" should be from the perspective of the end user of the system (the learners), not from the perspective of developers (not that we should completely ignore developer experience either). With that framing, is there really anything that prevents us from using Elasticsearch for all use cases? What are the concrete needs that necessitate using Meilisearch, Typesense, or Algolia? |
Fair enough. Of course, moving everything to Meilisearch would obviously address that concern completely. So I think it's safe to say there are other major concerns of similar importance or we wouldn't be having this discussion.
Perhaps you are unaware, but the situation is worse than that. Algolia is also used in 10+ different Open edX repos, and for example some important enterprise functionality requires it. As I understand, the reason for that is that 2U found their developers could build search features way faster and better using Algolia compared to Elasticsearch so they just rolled it out everywhere they needed it. I am optimistic that much of this Algolia usage could be converted to Meilisearch if we had buy-in for that idea, given that Meili and Algolia (and TypeSense) are all very similar (unlike Elasticsearch). On the other hand, the situation is better than that for the average new Sumac deployment of Open edX which is now using Meilisearch for everything out of the box. (No ElasticSearch needed, and all the Algolia features disabled.)
Despite the name of this ticket, I don't think the goal is to add a new engine randomly and maintain it alongside the others. The goal is to get insight on a new option for refining that long-term plan. The plan which is still in place as discussed.
Sure. That's what I was thinking and should have said but shortened to "feature matrix". But I think we need to cover more than use cases because there are a lot more considerations from an operations and developer perspective. I disagree that we should include only the end-user perspective in the list of considerations. If we consider only the end-user perspective, I think we will see Meilisearch is either a clear winner or tied for the win: it seems to support almost all the end-user use cases I'm aware of, and provides the fastest search experience. The only end-user feature that we have ever discussed that it's missing as far as I know is boolean keyword operators which sounds important but is really not significant in practice1. Users want search to be consistently available, and that can be achieved if the operator is using Meilisearch Cloud. From what I've understood, your main concern with Meiliearch is the lack of high availability, but since Meiliearch does have HA via Meilisearch Cloud, it's really more of an operator concern about being forced to use one particular company's service to achieve HA and the related concerns around relying on that company as the only option for HA (for now). To be clear: I do still think it's a good idea to put together such a use case / requirement comparison, and would welcome you doing that.
That's not really an end user requirement at all. In end user terms this question becomes "should search results be instant (as you type) or is some slight delay in search OK"? I don't think we care that much for most Open edX use cases. But! This is a very important question from the operator perspective: as you know, edxapp uses an old fashioned architecture with python's GIL and other factors limiting each process to handling one request at a time, and I believe that edx.org found that routing all the Elasticsearch requests through edxapp puts a noticeable load on the edxapp workers, for calculating permissions and for tying up the connection while waiting/forwarding results from elasticsearch. By allowing browsers to send most or all search requests directly to Meilisearch, we can reduce load on the edxapp servers, improving scalability and reducing costs.
First of all, Elasticsearch vs Opensearch has been a huge point of contention and my understanding is that ES and OS continue to diverge meaning that ultimately we'll have to pick one or the other, because already there are separate drivers/libraries and significant API incompatibilities, and there's no guarantee that the base-level API will remain compatible as it does now. Second of all, I want to circle back to your de-prioritzation of the developer experience. The fact is that search feature development stagnated when we had to use ES, whereas we're seeing lots of rich new search features being built with Algolia and Meilisearch. Quite a few people have expressed excitement about Meilisearch and stepped forward to do the work and build the features. That simply wasn't happening with ES. I think this is a significant factor not to be taken lightly, and analogous to how the use of super-complicated modulestore that nobody wants to touch has held back core feature development in edx-platform for the past decade. Third of all, in case it wasn't clear, as I understand it, using Elasticsearch almost doubles the cost of hosting Open edX instances on k8s environments, because RAM is usually the limiting factor and it uses so much RAM. That alone makes using ES for developing further search features a very marginal proposition. To point to another parallel, there are hardly any concrete user needs in support of removing MongoDB yet we all are hugely in favor of this. I am personally very happy to work with MIT, 2U, and other large instances (such as some OpenCraft clients) to make sure we have a search solution that works for large instances as well as small ones. But I'm personally pretty against continuing with ES/OS unless there's really not even a moderately decent alternative option. Footnotes
|
Case in point reported today on the forum: "Elasticsearch [uses] almost 40% of all memory." |
Acceptance Criteria
The scope of this abstraction layer would be to work on browser-oriented search engines like Meilisearch. This would not try to stretch to cover more traditional search engines like Elasticsearch, since doing so would be much more work and present performance concerns.
The biggest challenge is likely relate to tagging (quote from @bradenmacdonald):
Background
MIT has expressed concerns about Meilisearch's lack of failover/high-availability. While this feature is on the Meilisearch roadmap, it does not look like it will be prioritized in the near future.
At the same time, Algolia is an extremely popular commercial search engine that Meilisearch modeled its API on top of. While nobody has expressed interest in using Algolia yet, it is a strong long term possibility.
Testing
@blarghmatey, @pdpinch: Before this work kicks off, could you please verify that Typesense will be an acceptable backend? If there's early validation you need to do at the prototype step, I'd like to get a sense for that would look like on your side.
The text was updated successfully, but these errors were encountered: