-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index Studio content using Meilisearch [FC-0040] #34310
Merged
bradenmacdonald
merged 29 commits into
openedx:master
from
open-craft:braden/meilisearch-libraries
Mar 22, 2024
Merged
Changes from 27 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
463549f
feat: management command to index content libraries using Meilisearch
bradenmacdonald 81daab2
feat: include tags in library content index
bradenmacdonald ea01c70
refactor: cleanup
bradenmacdonald 9f75b4c
feat: index courseware too
bradenmacdonald b9eddb8
refactor: combine indexes into one
bradenmacdonald adbf357
feat: index tags (in a way compatible with InstantSearch)
bradenmacdonald 47dc3c1
feat: REST API to retrieve a user-specific API key
bradenmacdonald 6cf9e04
fix: better handling index existence check, better comments
bradenmacdonald 8637e16
feat: disable studio search (w/ Meilisearch) by default
bradenmacdonald 6431948
feat: limit search endpoint usage to global staff for now
bradenmacdonald 9c05806
chore: fix quality issues
bradenmacdonald af1402f
refactor: put all "content" data into the "content" field
bradenmacdonald 7c5cd7f
chore: fix lint issue and comment
bradenmacdonald 4c66d06
feat: add breadcrumbs to index
bradenmacdonald 67cf0cc
feat: use fallback display_name if needed
bradenmacdonald c53963e
feat: state how long the reindex_studio command took
bradenmacdonald 3884fa2
test: add tests for the Studio search REST API
bradenmacdonald 6be622e
docs: added ADR
bradenmacdonald 9b77442
chore: fix quality issue
bradenmacdonald 91d0358
fix: handle case where there are no library/course blocks
bradenmacdonald 0af6939
fix: hash used for ID wasn't stable
bradenmacdonald 106cfd0
fix: breadcrumbs weren't using fallback display_names
bradenmacdonald 1393d41
test: expand tests
bradenmacdonald 5a1429e
fix: lint issues
bradenmacdonald ec27d5f
chore: update with latest master
bradenmacdonald 69879e0
test: expand tests
bradenmacdonald 1b5dfdf
docs: fix typo pointed out in review
bradenmacdonald ec0c4a8
docs: clarify code with comments, indicate command is experimental
bradenmacdonald 1ff4b75
perf: pre-fetch all of the blocks in the course
bradenmacdonald File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
""" | ||
Define the content search Django App. | ||
""" | ||
|
||
from django.apps import AppConfig | ||
|
||
|
||
class ContentSearchConfig(AppConfig): | ||
"""App config for the content search feature""" | ||
|
||
default_auto_field = "django.db.models.BigAutoField" | ||
name = "openedx.core.djangoapps.content.search" | ||
|
||
def ready(self): | ||
# Connect signal handlers | ||
from . import handlers # pylint: disable=unused-import |
151 changes: 151 additions & 0 deletions
151
openedx/core/djangoapps/content/search/docs/decisions/0001-meilisearch.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
Studio Content Search Powered by Meilisearch | ||
############################################ | ||
|
||
Status | ||
****** | ||
|
||
Draft | ||
|
||
|
||
Context | ||
******* | ||
|
||
Existing search functionality | ||
============================= | ||
|
||
The Open edX platform currently implements many different forms of search. For | ||
example, users can search for course content, library content, forum posts, and | ||
more. Most of the search functionality in the core platform is powered by the | ||
Elasticsearch search engine (though other functionality developed by 2U, such as | ||
in edx-enterprise, is powered by Algolia). | ||
|
||
Most uses of Elasticsearch in Open edX use | ||
`edx-search <https://github.com/openedx/edx-search>`_ which provides a partial | ||
abstraction over Elasticsearch. The edx-search library formerly used | ||
`django-haystack <https://django-haystack.readthedocs.io/>`_ as an abstraction | ||
layer across search engines, but "that was ripped out after the package was | ||
abandoned upstream and it became an obstacle to upgrades and efficiently | ||
utilizing Elasticsearch (the abstraction layer imposed significant limits)" | ||
(thanks to Jeremy Bowman for this context). Due to these changes, the current | ||
edx-search API is a mix of abstractions and direct usage of the Elasticsearch | ||
API, which makes it confusing and difficult to work with. In addition, each | ||
usage of edx-search has been implemented fairly differently. See | ||
`State of edx-search <https://openedx.atlassian.net/wiki/spaces/AC/pages/3884744738/State+of+edx-search+2023>`_ | ||
for details (thanks to Andy Shultz). | ||
|
||
Other platform components use Elasticsearch more directly: | ||
|
||
* ``course-discovery`` and ``edx-notes-api`` do not use ``edx-search``, but are | ||
very tied to Elasticsearch via the use of ``django-elasticsearch-dsl`` and | ||
``django-elasticsearch-drf``. | ||
* ``cs_comments_service`` uses Elasticsearch via the official ruby gems. | ||
|
||
Problems with Elasticsearch | ||
=========================== | ||
|
||
At the same time, there are many problems with the current reliance on | ||
Elasticsearch: | ||
|
||
1. In 2021, the license of Elasticsearch changed from Apache 2.0 to a more | ||
restrictive license that prohibits providing "the products to others as a | ||
managed service". Consequently, AWS forked the search engine to create | ||
OpenSearch and no longer offers Elasticsearch as a service. This is | ||
problematic for many Open edX operators that use AWS and prefer to avoid | ||
any third-party services. | ||
2. Elasticsearch is very resource-intensive and often uses more than a gigabyte | ||
of memory just for small search use cases. | ||
3. Elasticsearch has poor support for multi-tenancy, which multiplies the | ||
problem of resource usage for organizations with many small Open edX sites. | ||
4. The existing usage of edx-search/Elasticsearch routes all search requests and | ||
result processing through edxapp (the LMS) or other IDAs, increasing the | ||
load on those applications. | ||
|
||
Need for Studio Search | ||
====================== | ||
|
||
At the time of this ADR, we have a goal to implement new search functionality in | ||
Studio, to support various course authoring workflows. | ||
|
||
Meilisearch | ||
=========== | ||
|
||
Meilisearch ("MAY-lee search") is a new, promising search engine that offers a | ||
compelling alternative to Elasticsearch. It is open source, feature rich, and | ||
very fast and memory efficient (written in Rust, uses orders of magnitude less | ||
memory than Elasticsearch for small datasets). It has a simple API with an | ||
official python driver, and has official integrations with the popular | ||
Instantsearch frontend library from Algolia. It has strong support for | ||
multi-tenancy, and allows creating restricted API keys that incorporate a user's | ||
permissions, so that search requests can be made directly from the user to | ||
Meilisearch, rather than routing them through Django. Initial testing has shown | ||
it to be much more developer friendly than Elasticsearch/OpenSearch. | ||
|
||
At the time of writing, there are only two known concerns with Meilisearch: | ||
|
||
1. It doesn't (yet) support High Availability via replication, although this is | ||
planned and under development. It does have other features to support high | ||
availability, such as very low restart time (in ms). | ||
2. It doesn't support boolean operators in keyword search ("red AND panda"), | ||
though it does of course support boolean operators in filters. This is a | ||
product decision aimed at keeping the user experience simple, and is unlikely | ||
to change. | ||
|
||
|
||
Decision | ||
******** | ||
|
||
1. We will implement the new Studio search functionality using Meilisearch, | ||
as an experiment and to evaluate it more thoroughly. | ||
2. The Studio search functionality will be disabled by default in the next | ||
Open edX release (Redwood), so that Meilisearch will not be a requirement | ||
for any default nor existing features. This will also allow us to evaluate it | ||
before deciding to embrace it or replace it. | ||
3. We will keep the Meilisearch-specific code isolated to the | ||
new ``content/search`` Django app, so it's relatively easy to swap out later | ||
if this experiment doesn't pan out. | ||
4. We will not use ``edx-search`` for the new search functionality. | ||
|
||
|
||
Consequences | ||
************ | ||
|
||
1. Organizations that wish to try out the new Studio Search functionality in | ||
the Redwood release will have to install and configure Meilisearch. | ||
2. Building both the backend and frontend components of the Studio search | ||
project will be much faster and simpler than if we used ElasticSearch, | ||
edx-search, OpenSearch, django-haystack, etc. | ||
3. Keyword search with boolean operators will not be supported in any of the new | ||
search features. | ||
|
||
|
||
Alternatives Considered | ||
*********************** | ||
|
||
OpenSearch Only | ||
=============== | ||
|
||
Moving existing search functionality to OpenSearch is a possibility, but though | ||
it mostly addresses the licensing issue, it doesn't solve the problems of | ||
resource usage, API complexity, edx-search API complexity, lack of Instantsearch | ||
integration, and poor multi-tenancy. | ||
|
||
OpenSearch and Elasticsearch | ||
============================ | ||
|
||
When OpenSearch was originally forked from Elasticsearch, it was completely API | ||
compatible, but over time they have developed along divergent paths. Regardless | ||
of whether ElasticSearch and OpenSearch are actually wire-compatible, recent | ||
versions of all the official ElasticSearch clients have been made to actively | ||
reject connections to OpenSearch, which is why you generally won't find client | ||
libraries that work with both engines, and why there are OpenSearch forks of | ||
everything on the client side as well as the server side. | ||
|
||
As there is no ready-to-use abstraction layer that would allow us to comfortably | ||
support both, and no interest in maintaining one ourselves, this is not an | ||
appealing option. | ||
|
||
Algolia | ||
======= | ||
|
||
Algolia is a great search engine service, but as it is a proprietary product, it | ||
is not suitable as a requirement for an open source platform like Open edX. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Someone (I think maybe @kdmccormick) pointed out to me that you don't need to explicitly put the app config if there's only one.