Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update search index when course content is updated [FC-0040] #34391

Conversation

rpenido
Copy link
Contributor

@rpenido rpenido commented Mar 19, 2024

Description

This PR implements updating the search index with content metadata whenever they change

More Infomation

Testing Instructions

NOTE: Meilisearch seems to cache queries along with their results in the frontend, so if you simply search it might
show you stale data (network tab shows no requests), especially if you're searching for the same query. Make sure you refresh the Meilisearch dashboard (http://meilisearch.local.edly.io:7700/) and then perform the search.


Private ref: FAL-3690

@openedx-webhooks
Copy link

openedx-webhooks commented Mar 19, 2024

Thanks for the pull request, @rpenido! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 19, 2024
Copy link
Member

@yusuf-musleh yusuf-musleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this @rpenido ! I realized after reviewing that a lot of the changes on this PR diff are from @bradenmacdonald 's PR since this PR wasn't updated with upstream yet. None the less I think the code overall is really well structured and easy to follow!

Just had a small question/nit.

However in terms of testing, I'm not sure if its ready or not, but after briefly tried I was faced with some errors (maybe because the PR needs to be updated with upstream), so I just reviewed the code for now. @rpenido Let me know when its ready for another review/testing it out.

openedx/core/djangoapps/content/search/api.py Outdated Show resolved Hide resolved
Comment on lines 55 to 65
upsert_xblock_index_doc.delay(
str(xblock_info.usage_key),
recursive=True, # Update all children because the breadcrumb may have changed
update_metadata=True,
update_tags=False,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is update_tags set to False here? Is it because we are going to implement updating tags in the index separately in openedx/modular-learning#197 ?

Copy link
Contributor Author

@rpenido rpenido Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This handler is called when we receive an XBLOCK_UPDATED event. If this block already has tags, I think the index will still be up to date with its information.

The upsert_xblock_index_doc did not replace the document in the index; it only updates it with the new fields (https://www.meilisearch.com/docs/reference/api/documents#add-or-update-documents).

For the tagging data, I was planning to mirror the implementation we did here:

  • Create an OBJECT_TAGGED event that is dispatched when we call the tag_object api.
  • Create a new handler here that only updates the tags for the object and leaves the metadata as it is ({recursive=False, update_metadata: False, updat_tags=True});

But I'm open to new ideas if you think otherwise! 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as long as we are careful not to overwrite the tags field with an empty list, this is a good approach. Generally the tags haven't changed when this event is called, so there's no reason to update them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks for the clarification, and I agree I think it's a good approach. I'll follow it when working on openedx/modular-learning#197

@rpenido rpenido force-pushed the rpenido/fal-3690-update-search-index-when-course-content-is-changed branch 5 times, most recently from 34f5722 to cab3af9 Compare March 20, 2024 21:52
@rpenido
Copy link
Contributor Author

rpenido commented Mar 21, 2024

I realized after reviewing that a lot of the changes on this PR diff are from @bradenmacdonald 's PR since this PR wasn't updated with upstream yet.

We have this temp PR (open-craft#645). It could be easier to review before we get #34310 merged.

However in terms of testing, I'm not sure if its ready or not, but after briefly tried I was faced with some errors

Sorry for that @yusuf-musleh! I probably messed up something when I merged the changes from @bradenmacdonald.

I will let you know when this is ready for testing again (probably for a final review).

Also, I started writing some tests for the new API, but I'm not happy with them. As meilisearch is an external service, I need to mock many functions and only really test if the external API is called with the expected parameters. Maybe we should fire up a Meileisearch instance during the tests to actually test the feature (as we do with external services, like the database), but I'm not sure if this is the right time to do this (and I don't know how to do this in our CI suit either).

I plan to continue with the current approach (mocking and checking calls) for the API and the handlers to get some coverage, but I think we should revisit this if the prototype becomes the default search engine. Let me know if you have a better idea for this!

@yusuf-musleh
Copy link
Member

As meilisearch is an external service, I need to mock many functions and only really test if the external API is called with the expected parameters.

@rpenido I think this approach is fine, since we can make the assumption that Meilisearch will behave as expected after we manually tested it. However I can see the value of having integration tests to actually test that it behaves as expected in case there are breaking/unexpected changes on the Meilisearch side. Although that could be avoided by using the same working version vs updating to the latest version every time.

I think we should revisit this if the prototype becomes the default search engine

I agree, I think this can come at a later stage. As you suggested, we can potentially follow the footsteps of the mysqldb when the time comes.

@rpenido rpenido force-pushed the rpenido/fal-3690-update-search-index-when-course-content-is-changed branch 2 times, most recently from 7c82a0b to bd28dec Compare March 22, 2024 15:50
@bradenmacdonald
Copy link
Contributor

@rpenido I agree that we should eventually run Meilisearch during tests. However, it's probably not necessary to implement that for this first studio search project which is considered experimental. Maybe you could add that to the ADR that I wrote? That for the experiment, we won't use Meilisearch during tests, but we expect to add that in the future if we move forward with replacing Elasticsearch completely.

@rpenido rpenido force-pushed the rpenido/fal-3690-update-search-index-when-course-content-is-changed branch 3 times, most recently from a2e9d2a to 4a55c89 Compare March 26, 2024 20:57
@rpenido rpenido marked this pull request as ready for review March 26, 2024 22:13
@rpenido rpenido requested a review from yusuf-musleh March 26, 2024 22:58
@rpenido
Copy link
Contributor Author

rpenido commented Mar 26, 2024

Hi @yusuf-musleh ! There is a test failing here, but I can't find the logs. I will look into it further tomorrow.

Edit: fixed 5a6beafc86276aafa9a02ca895f49b9d691a2905

Copy link
Member

@yusuf-musleh yusuf-musleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpenido Great work, the code is coming along quite well! The code looks good to me, however while testing the library aspects, I noticed a few issues:

  • When I create a new empty library, nothing gets added to the index, is that expected?
  • When I add a component/block to the library that gets added to the index, as expected, when I edit the blocks, eg: the name or the problem type (for problem blocks) the index does not get updated. However If I change the name of the library, thats when the blocks under it get updated in the index with the latest information. It's probably just a missing condition on how/when the docs should be updated in the index.

@rpenido
Copy link
Contributor Author

rpenido commented Mar 28, 2024

Thank you for the review @yusuf-musleh!

* When I create a new empty library, nothing gets added to the index, is that expected?

I think that it is expected. I commented on the first PR but I think it got lost in the review comments.
@bradenmacdonald, could you confirm that we are not indexing the Library and the Course entities?

* When I add a component/block to the library that gets added to the index, as expected, when I edit the blocks, eg: the name or the problem type (for problem blocks) the index does not get updated. However If I change the name of the library, thats when the blocks under it get updated in the index with the latest information. It's probably just a missing condition on how/when the docs should be updated in the index.

This is because the LIBRARY_BLOCK_UPDATED event is not dispatched when we edit an library XBlock. Not sure if we will fix it now or later.

@rpenido rpenido changed the title feat: update search index when course content is updated [FC-0040] feat: update search index when course content is updated Mar 28, 2024
@rpenido rpenido changed the title [FC-0040] feat: update search index when course content is updated feat: update search index when course content is updated [FC-0040] Mar 28, 2024
@rpenido
Copy link
Contributor Author

rpenido commented Mar 28, 2024

@yusuf-musleh I want to rebase and squash this PR, but you will need to rebase it and merge conflicts again after that too (it will happen anyway when this goes upstream). Let me know when you think it is a good time to do it!

@rpenido rpenido force-pushed the rpenido/fal-3690-update-search-index-when-course-content-is-changed branch 2 times, most recently from 128e717 to b7aef89 Compare March 28, 2024 13:53
@bradenmacdonald
Copy link
Contributor

@rpenido @yusuf-musleh We aren't indexing the courses/libraries themselves. So it's expected that a newly created library won't appear in the results.

@rpenido rpenido force-pushed the rpenido/fal-3690-update-search-index-when-course-content-is-changed branch from b7aef89 to 45b6bf9 Compare March 28, 2024 19:34
@bradenmacdonald
Copy link
Contributor

This is ready to merge but we're waiting on some unrelated issues with the NodeJS 18 upgrade to be solved first.

@rpenido
Copy link
Contributor Author

rpenido commented Apr 16, 2024

Hi @bradenmacdonald ! I added this commit (612f32f) here.
There are some old libraries (before the recent module store refactor) that crash our reindex.

I deleted these libraries locally, and the code was working. In the sandbox, the error appeared again, so I think it would be better to skip the library if we found an error instead of crashing the reindex.

```File"/openedx/edx-platform/openedx/core/djangoapps/content_libraries/api.py",line
617, in get_library_componentslearning_package.id,AttributeError:
'NoneType' object has no attribute'id'```
@rpenido rpenido force-pushed the rpenido/fal-3690-update-search-index-when-course-content-is-changed branch from 83d6823 to 612f32f Compare April 16, 2024 22:27
@bradenmacdonald
Copy link
Contributor

Thanks @rpenido - that's a great fix.

@bradenmacdonald
Copy link
Contributor

@rpenido I was just about to merge this, but there are some conflicts with the "permissions" PR that I just merged. Could you please fix them?

@rpenido rpenido marked this pull request as draft April 17, 2024 20:21
@rpenido
Copy link
Contributor Author

rpenido commented Apr 17, 2024

@rpenido I was just about to merge this, but there are some conflicts with the "permissions" PR that I just merged. Could you please fix them?

Working on it. The code changed a lot, so I will need to manually test it to make sure I did the merge right.

@bradenmacdonald
Copy link
Contributor

@rpenido Take your time - it's important to get it right :)
Thanks.

@rpenido rpenido force-pushed the rpenido/fal-3690-update-search-index-when-course-content-is-changed branch from 36c48a4 to 58f8ff4 Compare April 17, 2024 20:58
@rpenido
Copy link
Contributor Author

rpenido commented Apr 17, 2024

This is ready for review and merge @bradenmacdonald. It also includes the changes from your PR:

It may have impacted your PR @yusuf-musleh.

@bradenmacdonald bradenmacdonald merged commit 90b253a into openedx:master Apr 18, 2024
66 checks passed
@openedx-webhooks
Copy link

@rpenido 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

@xitij2000 xitij2000 deleted the rpenido/fal-3690-update-search-index-when-course-content-is-changed branch April 23, 2024 08:06
KyryloKireiev pushed a commit to raccoongang/edx-platform that referenced this pull request Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[Course Search] Update search index when course content is changed
5 participants