Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict search results based on user permissions [FC-0040] #34471

Merged
merged 17 commits into from
Apr 17, 2024

Conversation

pomegranited
Copy link
Contributor

@pomegranited pomegranited commented Apr 4, 2024

Description

Meilisearch allows us to create JWTs with embedded "filters", which guarantee that users searching with these tokens will only be permitted to see the documents that match these filters.

In Open edX, we grant permissions to block content based on its learning context, i.e. the course or content library that contains the block. Access to courses can be done in two ways, either individually on the course, or using org level rights. Access to content libraries is managed individually on the library (only library creation uses org level rights).

Because of these individual access rights, the list of courses and libraries a user has access to can be very long, >1000 in some instances (see comment). But since we need to encode that list into a JWT, we're limited to a total token size of 8Kib, but course keys can be very long (255 char). So to facilitate packing as many course/library identifiers into this token, this PR:

  • Adds a SearchAccess model with a numeric ID, mapped to all CourseOverview and ContentLibrary context_keys.
  • Adds an access_id field to the search-indexed Document, storing the SearchAccess.id relevant to that block's learning context.
  • Adds an access_id filter to the JWT token generation, containing the SearchAccess.id's for the top 1,000 learning contexts that the user has access to. (We omit access_ids covered by the org filter described below.)
  • Adds an org filter to the JWT token generation, containing max 1,000 org names that the user has been granted access to.

Supporting information

Content search is not enabled by default, so this change should have no impact on edX/2U or other sites running from master.

closes openedx/modular-learning#198

Private-ref: FAL-3692

Testing instructions

Setup:

  1. Ensure that you are running the latest https://github.com/open-craft/tutor-contrib-meilisearch -- this will ensure that MEILISEARCH_ENABLED = True is set in the CMS and the MFE.
  2. Enable course-authoring MFE, running from the branch in feat: add content search modal [FC-0040] frontend-app-authoring#928
  3. Load https://github.com/open-craft/taxonomy-sample-data/
  4. Using this branch, run migrations: python manage.py cms migrate
  5. Also using this branch, reindex studio content: python manage.py cms reindex_studio --experimental

Testing:

  1. Login to Studio and the Course Authoring MFE as a global staff user.
  2. Click the 'search' button in the header bar to start searching studio content.
    Ensure that you can see content from all your courses.
  3. Repeat these steps with a non-global staff user who only has access to a single course or library.
    Ensure that user can only see that course/library when searching Studio content.

Deadline

None

Other information

  1. Anyone using content.search will need to run a full reindex to add the new document access_id field:
    python manage.py cms reindex_studio --experimental
    

Stores a numeric ID for each course + library, which will generally be
shorter than the full context_key, so we can pack more of them into the
the Meilisearch search filter.

Also:

* Adds data migration pre-populates the SearchAccess model from the existing
  CourseOverview and ContentLibrary records
* Adds signal handlers to add/remove SearchAccess entries when content
  is created or deleted.
* Adds get_access_ids_for_request() helper method for use in views.
* Adds tests.
@openedx-webhooks
Copy link

openedx-webhooks commented Apr 4, 2024

Thanks for the pull request, @pomegranited! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Apr 4, 2024
* Adds an access_id field to the document, which stores the
  SearchAccess.id for the block's context.
* Use the requesting user's allowed access_ids to filter search results
  to documents with those access_ids.
* Since some users have a lot of individual access granted, limit the
  number of access_ids in the filter to a large number (1_000)
* Updates tests to demonstrate.
Copy link
Contributor

@rpenido rpenido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @pomegranited! LGTM 👍

I tested this, and it worked after a minor fix (e1e910a). Could you make this change in your PR?

Also, I added a comment about filtering orgs.

Note that #34391 refactors part of the base code used here. If it gets merged before, you will have to move some things around here. I don't think it will be a lot, though.

openedx/core/djangoapps/content/search/views.py Outdated Show resolved Hide resolved
rpenido and others added 3 commits April 5, 2024 11:51
In theory, we shouldn't have to do this, because the CREATE and DELETE
events should keep the SearchAccess table up-to-date.

But in practice, signals can be missed (or in tests, they may be
disabled). So we assume that it's ok to re-use a SearchAccess.id created
for a given course or library context_key.
Uses helper methods and decorators to wrap the settings and patches used
by multiple view tests.
for lib in ContentLibrary.objects.select_related("org").only("org", "slug"):
# ContentLibrary.library_key property is not available here, so we have to assemble it.
lib_key = LibraryLocatorV2(org=lib.org.short_name, slug=lib.slug)
SearchAccess.objects.create(context_key=lib_key)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're concerned about the performance of this data migration, I can remove it and add a management command instead.

Copy link
Contributor Author

@pomegranited pomegranited Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @rpenido -- I decided to remove the data migration (ef89591): running reindex_studio handles this for us.

* Uses content_tagging.rules.get_user_orgs to fetch the user's
  content-related orgs for use in the meilisearch filter.
* Limits the number of orgs used to 1_000 to keep token size down
@pomegranited pomegranited marked this pull request as ready for review April 5, 2024 04:40
Copy link
Contributor

@rpenido rpenido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pomegranited!

I think we went the wrong path here regarding the org + access_id filter.

If I understand correctly, we should use the access_id filter only for course-level role assignments.

I put some tests (that are falling) here (open-craft#650) to show what I'm expecting. Could you check if my understanding is right? 😄

PS: It adds a lot of boilerplate because I'm creating a course to test. You don't need to add these tests to this PR.


# Everyone else is limited to their org roles...
user_orgs = [
org.short_name for org in get_user_orgs(request.user)[:MAX_ORGS_IN_FILTER]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user can access only one course in orgA, orgA will be returned here. I think we want to add only orgs that the user is an admin here. Maybe use _get_content_creator_orgs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 1181eea

"""
Returns the SearchAccess.id values that the user has read access to.
"""
courses, _ = get_courses_accessible_to_user(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to get the IDs only for course-level assignments here. If the user is staff in orgA and doesn't have any course-level role, this list should be empty. I'm not sure if we have a function for that.

Maybe use what we did in _get_course_user_orgs in the tagging rules, but return the course ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with 6a3c30a.

Users should use the reindex_studio management command to populate SearchAccess.
to allow general access to the user's RoleCache without having to access
private attributes of User or RoleCache.

Related changes:

* Moves some functionality from openedx.core.djangoapps.enrollments.data.get_user_roles
  to this new helper method.
* Use these new helper method in content_tagging.rules
@pomegranited pomegranited force-pushed the jill/content-search-perms branch from 6a490cd to a917c64 Compare April 9, 2024 05:13
instead of all course keys that the user can read.

Org- and GlobalStaff access checks will handle the rest.
Also refactors tests to demonstrate this change for OrgStaff and
OrgInstructor users.
@pomegranited pomegranited force-pushed the jill/content-search-perms branch from 1181eea to c7215ac Compare April 9, 2024 07:23
@pomegranited
Copy link
Contributor Author

I put some tests (that are falling) here (open-craft#650) to show what I'm expecting. Could you check if my understanding is right? 😄

Thank you for those added tests! I didn't use them directly, but please see 6a3c30a and c7215ac for the relevant changes and tests.

Copy link
Contributor

@rpenido rpenido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @pomegranited! Good work here!
LGTM 👍

  • I tested this using the instructions in my tutor stack
  • I read through the code
  • I checked for accessibility issues
  • Includes documentation

Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks. I will try to merge when I can but there are some unrelated issues affecting platform today so may need to wait.

@bradenmacdonald bradenmacdonald merged commit d672110 into openedx:master Apr 17, 2024
66 checks passed
@bradenmacdonald bradenmacdonald deleted the jill/content-search-perms branch April 17, 2024 18:21
@openedx-webhooks
Copy link

@pomegranited 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

KyryloKireiev pushed a commit to raccoongang/edx-platform that referenced this pull request Apr 24, 2024
…dx#34471)

* feat: adds SearchAccess model

Stores a numeric ID for each course + library, which will generally be
shorter than the full context_key, so we can pack more of them into the
the Meilisearch search filter.

Also:

* Adds data migration pre-populates the SearchAccess model from the existing
  CourseOverview and ContentLibrary records
* Adds signal handlers to add/remove SearchAccess entries when content
  is created or deleted.
* Adds get_access_ids_for_request() helper method for use in views.
* Adds tests.

* test: can't import content.search in lms tests

* feat: use SearchAccess in documents and views

* Adds an access_id field to the document, which stores the
  SearchAccess.id for the block's context.
* Use the requesting user's allowed access_ids to filter search results
  to documents with those access_ids.
* Since some users have a lot of individual access granted, limit the
  number of access_ids in the filter to a large number (1_000)
* Updates tests to demonstrate.

* test: can't import content.search or content_staging in lms tests

* fix: make access_id field filterable

* fix: use SearchAccess.get_or_create in signal handlers

In theory, we shouldn't have to do this, because the CREATE and DELETE
events should keep the SearchAccess table up-to-date.

But in practice, signals can be missed (or in tests, they may be
disabled). So we assume that it's ok to re-use a SearchAccess.id created
for a given course or library context_key.

* refactor: refactors the view tests to make them clearer

Uses helper methods and decorators to wrap the settings and patches used
by multiple view tests.

* feat: adds org filters to meilisearch filter

* Uses content_tagging.rules.get_user_orgs to fetch the user's
  content-related orgs for use in the meilisearch filter.
* Limits the number of orgs used to 1_000 to keep token size down

* refactor: removes data migration

Users should use the reindex_studio management command to populate SearchAccess.

* refactor: adds functions to common.djangoapps.student.role_helpers

to allow general access to the user's RoleCache without having to access
private attributes of User or RoleCache.

Related changes:

* Moves some functionality from openedx.core.djangoapps.enrollments.data.get_user_roles
  to this new helper method.
* Use these new helper method in content_tagging.rules

* fix: get_access_ids_for_request only returns individual access

instead of all course keys that the user can read.

Org- and GlobalStaff access checks will handle the rest.

* fix: use org-level permissions when generating search filter

Also refactors tests to demonstrate this change for OrgStaff and
OrgInstructor users.

* refactor: remove SearchAccess creation signal handlers

Lets SearchAccess entries be created on demand during search indexing.

* feat: omit access_ids from the search filter that are covered by the user's org roles

---------

Co-authored-by: Rômulo Penido <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[Course Search] Restrict search results based on user permissions
6 participants