Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [FC-0056] Implement Sidebar Navigation #34457

Conversation

NiedielnitsevIvan
Copy link
Contributor

@NiedielnitsevIvan NiedielnitsevIvan commented Apr 2, 2024

Settings

COURSEWARE_MICROFRONTEND_NAVIGATION_SIDEBAR_BLOCKS_DISABLE_CACHING:
  - name: courseware.disable_navigation_sidebar_blocks_caching
    everyone: true

Description

This PR addresses to the need to add an API for Sidebar Navigation that returns the course structure with sections, subsections, and units, according to user rights.

To improve the performance of the API, was added caching of the course structure for the user, which makes it much easier to calculate the block structure for the user at each request. However, there may be cases when this caching can lead to an overflow of the cache storage in high-loaded LMS with large courses, so the corresponding flag "courseware. disable_navigation_sidebar_blocks_caching" was added so that this caching can be disabled.

Testing instructions

  • Run master devstack.
  • Start platform make dev.up and make checkout on this branch.
  • Create a course with different access rights to course blocks (sections/subsections/subsections) for different users.
  • Make GET requests to the API (/api/course_home/sidebar/{course_id}) from different users and check if the API returns the correct blocks depending on the user's permissions.
  • Update the course and check if there are any new elements in the API response.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Apr 2, 2024
@openedx-webhooks
Copy link

Thanks for the pull request, @NiedielnitsevIvan! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@NiedielnitsevIvan NiedielnitsevIvan force-pushed the Ivan_Niedielnitsev/feat/Implement-Sidebar-Navigation branch from 6632124 to bb1ba59 Compare April 2, 2024 09:12
@arbrandes arbrandes requested review from arbrandes and ormsbee April 4, 2024 14:44
Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some questions and comments. Overall, I think this is a pretty pragmatic PR that balances correctness with performance without requiring too much new code. My main concerns are related to the cache cleanup performance and what happens when a person's user partition group changes for the purposes of Unit-level access (e.g. enrollment track).

I would like to know what kinds of response times you're seeing locally on large courses, and where the bottlenecks are presently. You don't have to give detailed profiling traces or anything–I'm just trying to understand if you know generally how this behaves for a larger course in terms of response time and database queries made. Traditionally completion code was one of the slowest things because of n+1 queries, but it looks like you're doing a batch fetch for that, which is great.

Thank you!

@@ -44,6 +48,11 @@
OutlineTabView.as_view(),
name='outline-tab'
),
re_path(
fr'sidebar/{settings.COURSE_KEY_PATTERN}',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a name we could pick that better reflects what it is from the server's point of view (e.g. course navigation), rather than where it's being placed in the UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +556 to +567
completions = BlockCompletion.objects.filter(user=self.request.user, context_key=course_key).values_list(
'block_key',
'completion',
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fuzzy on how BlockCompletion works, but do we need everything in the course for this user? Can we just grab completion data for Units and up? Or do we need all the low level stuff because we're recalculating completion on the fly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, BlockCompletion doesn't store data for Units, only for lower-level blocks, so we have to grab all the data for the user in the course and calculate Completions for the level of Units and up on our own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bleh. Yeah, that makes sense. Unfortunate though.

Comment on lines 541 to 544
return list(filter(
lambda seq_data: seq_data['id'] in available_sequence_ids or seq_data['type'] != 'sequential',
course_sequences
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (optional): In general, please prefer list comprehensions to list + filter, since they're more commonly used and familiar to Python devs.

children = block.get('children', [])
child_classes = {child.get('type') for child in children}
new_class = 'other'
icon_call_priority = ['video', 'problem']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment (not required): If "problem" is a proxy for "is this something that you as a student need to submit", we could scan through the classes to see if have the has_score class attribute set to True. That would catch things like ORA, drag and drop, etc. Another oddball case is library_content which is just a container, but is used for problems the vast majority of the time.

Comment on lines 530 to 532
return list(filter(
lambda section_data: section_data['id'] in available_section_ids, course_sections
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (optional): In general, please prefer list comprehensions to list + filter, since they're more commonly used and familiar to Python devs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I agree with you that the list comprehensions syntax is more familiar and readable, but in this case I deliberately used list + filter because it has better performance, which can be useful for large courses.

However, if you are in favor of the list comprehensions approach, I can change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I didn't realize it was done for performance reasons. How big of a difference does it make on large courses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took measurements and the difference is actually 200-300 ms on average, which can be explained by the error. Therefore, it makes sense to return to list comprehensions for the sake of readability.

"""
if 'children' in block:
block['children'] = [self.mark_complete_recursive(child) for child in block['children']]
block['complete'] = all(child['complete'] for child in block['children'] if child['type'] != 'discussion')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be looking for the class attribute completion_mode on the relevant XBlock classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because in our case, block is not an XBlock object, but a dict with block data, and therefore we cannot get the value of the completion_mode field here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, let me ask this another way: Why is the "discussion" block explicitly excluded here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "discussion" block is explicitly excluded here because it cannot be marked as completed, in which case units that have a "discussion" block will never be marked as completed either.

I also considered the option of checking the complete status for blocks that have has_score=True, but in this case it would only apply to problems, which is also not entirely correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this get the right tags then?

from xblock.core import XBlock
from xblock.completable import XBlockCompletionMode

completable_tags = {
    tag for (tag, cls) in XBlock.load_classes()
    if XBlockCompletionMode.get_mode(cls) == XBlockCompletionMode.COMPLETABLE
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thank you very much, I didn't know that we could get completable block types in this way.
Added this change.

for section_data in course_sections:
section_data['children'] = self.get_available_sequences(
user_course_outline,
section_data.get('children', [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are definitely necessary? I'm surprised the API building course_blocks would return no children attribute, rather than returning an empty list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not fully understand what you mean, can you explain in more detail please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected section_data to always have a children key, even if the value is []. But the fact that you're doing section_data.get('children', []) implies that sometimes that key is missing entirely. That surprised me.

)
accessible_sequence_ids = {str(usage_key) for usage_key in user_course_outline.accessible_sequences}
for sequence_data in section_data['children']:
sequence_data['accessible'] = sequence_data['id'] in accessible_sequence_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining the difference between accessible and available in the above code. I've stared at outline code long enough to get it, but it's probably going to confuse developers looking at this for the first time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think it can be fixed by renaming get_available_sections to get_accessible_sections and so on.

@@ -141,6 +142,7 @@ def listen_for_course_publish(sender, course_key, **kwargs): # pylint: disable=
# register special exams asynchronously after the data is ready
course_key_str = str(course_key)
transaction.on_commit(lambda: update_special_exams_and_publish.delay(course_key_str))
drop_course_sidebar_blocks_cache(course_key_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be problematic at scale, and that it's better to rely on the version changing in the key for invalidation rather than trying to iterate through the whole list of cache keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, after adding the course version to the cache key.

@GlugovGrGlib
Copy link
Member

@ormsbee Please check this the performance analysis on tutor dev - https://raccoongang.atlassian.net/wiki/external/MDhhMjdlMDhjODVkNDBjYjkzZjAzMDdhOGZmMGM2MTk

@ormsbee
Copy link
Contributor

ormsbee commented Apr 10, 2024

@GlugovGrGlib: Thank you! Do you have a broad sense of what the bottlenecks are for those really large courses?

@NiedielnitsevIvan NiedielnitsevIvan force-pushed the Ivan_Niedielnitsev/feat/Implement-Sidebar-Navigation branch from eab71a1 to 9a6afaa Compare April 10, 2024 16:58
@ormsbee
Copy link
Contributor

ormsbee commented Apr 11, 2024

@GlugovGrGlib: Just as a heads up, when I run this locally on my laptop with the large test course and try loading just this REST endpoint in isolation, I get speeds of about 4.7 seconds for staff and around 6 seconds for a student. That's about 3X faster than when I try to hit those endpoints on the sandbox (I was loading the REST endpoint directly there as well, instead of using the MFE, since I didn't want it to get slowed down by other requests). And that difference is with the debug toolbar left on. Is there a possibility that course blocks caching isn't properly configured on the sandbox?

@ormsbee
Copy link
Contributor

ormsbee commented Apr 11, 2024

Or maybe profiling was left on?

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small requests (one optional), but I'd still like to know the answers to the questions I left in my last review before approving.

for higher_class in icon_call_priority:
if higher_class in child_classes:
new_class = higher_class
return new_class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: When I first read icon_call_priority, I assumed that the things that came first had higher priority, rather than the things that came last. I think this might read more clearly if you use returns instead of a new_class var, so something like:

    @staticmethod
    def get_vertical_icon_class(block):
        """
        Get the icon class for a vertical block based on its children.
        """
        children = block.get('children', [])
        child_classes = {child.get('type') for child in children}
        icon_call_priority = ['problem', 'video']
        for item_type in icon_call_priority:
            if item_type in child_classes:
                 return item_type
        return 'other'  # default

You could also make it more explicit/obvious like this, since there are only a few classes that matter at the moment:

    @staticmethod
    def get_vertical_icon_class(block):
        """
        Get the icon class for a vertical block based on its children.
        """
        children = block.get('children', [])
        child_classes = {child.get('type') for child in children}
        if 'problem' in child_classes:
            return 'problem'
        if 'video' in child_classes:
            return 'video'
        return 'other'

Either way, please describe the intended ordering in the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +556 to +567
completions = BlockCompletion.objects.filter(user=self.request.user, context_key=course_key).values_list(
'block_key',
'completion',
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bleh. Yeah, that makes sense. Unfortunate though.

Comment on lines 480 to 486
if cached:
# If the data was cached, we need to mark the blocks as complete or not complete.
course_blocks = self.mark_complete_recursive(course_blocks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain in a comment why we need to mark the blocks as complete only when we get a cache hit, and not when there's a cache miss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated.

@NiedielnitsevIvan
Copy link
Contributor Author

@ormsbee Hello!
During testing, we found a bug that after changing the name in the course blocks (sections, subsections, and units), their name does not immediately change in the sidebar. A similar problem with the names of sections and subsections is currently present on the Course Outline page. The problem lies in the caching of the course structure at the BlockStore level.

One of the solutions to make the cache update instantly or much more often is to change the BlockStructureConfiguration, where you can specify the cache lifetime, but this can affect performance globally.

@ormsbee
Copy link
Contributor

ormsbee commented Apr 19, 2024

@NiedielnitsevIvan: Is that because the collect phase of block transformers takes time to run, so the version you get immediately after making a change is still the course blocks of the old course, and then that gets cached by this new code?

@NiedielnitsevIvan
Copy link
Contributor Author

@NiedielnitsevIvan: Is that because the collect phase of block transformers takes time to run, so the version you get immediately after making a change is still the course blocks of the old course, and then that gets cached by this new code?

That's right, because the course version changes, and the blocks returned from get_course_outline_block_tree are still old, so cache invalidating by course version doesn't work in this case.

@GlugovGrGlib
Copy link
Member

GlugovGrGlib commented Apr 19, 2024

Do you have a broad sense of what the bottlenecks are for those really large courses?

Sorry, I really didn't go into the depth of the specific code that is the slowest to execute for some time already. Last time we needed to troubleshoot these issues for a client, it was around 2020-2021, when there weren't Learning MFE, and in parallel with your inputs to this discourse post, we got similar results at the time.

Additionally to this, recently we have tested fetching and processing the course structure to find the optimal course setup for a client, you might find those results insightful https://raccoongang.atlassian.net/wiki/external/NWJlYzYxYzRlYzRlNGE5YmFkYjkxYmE0ZTdlNTZjOWE.

Is there a possibility that course blocks caching isn't properly configured on the sandbox?

I believe you were right, initially I was confused about waffle flag, as It should be used to turn off caching, but was implemented the other way. Latter Ivan inverted the behavior for waffle flag, and the performance for loading from cache on the sandbox was enhanced.

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, because the course version changes, and the blocks returned from get_course_outline_block_tree are still old, so cache invalidating by course version doesn't work in this case.

In that case, can you grab the cache value from the BlockStructureModel (I think you might have to use the UsageKey of the root Course block instead of the CourseKey)?

data_usage_key = UsageKeyWithRunField(
'Identifier of the data being collected.',
blank=False,
max_length=255,
unique=True,
)
data_version = models.CharField(
'Version of the data at the time of collection.',
blank=True,
null=True,
max_length=255,
)
data_edit_timestamp = models.DateTimeField(
'Edit timestamp of the data at the time of collection.',
blank=True,
null=True,
)

And then fall back to the modulestore's root CourseBlock version if that's not available for some reason? Does that resolve the cache invalidation issue?

Comment on lines 66 to 69
COMPLETABLE_BLOCK_TYPES = {
block_type for (block_type, block_cls) in XBlock.load_classes()
if XBlockCompletionMode.get_mode(block_cls) == XBlockCompletionMode.COMPLETABLE
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have specified when I gave this example: Please don't put this at the module level. You can put it in a function and wrap it in a @functools.cache decorator (or a method and wrap in @functools.cached_property. But we've had problems in the past where XBlock machinery was being initialized earlier than expected as side-effect of module-level statements like this, and it was hard to track down. Putting it in a function or method will help to make sure we don't actually invoke it until the view itself is executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this problem.
Fixed it.

@NiedielnitsevIvan NiedielnitsevIvan force-pushed the Ivan_Niedielnitsev/feat/Implement-Sidebar-Navigation branch from ef36dca to 9edc20f Compare April 26, 2024 07:08
Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash your commits and consider looking into the BlockStructureModel thing I mentioned w.r.t. cache invalidation as a followup. Thank you.

@NiedielnitsevIvan NiedielnitsevIvan force-pushed the Ivan_Niedielnitsev/feat/Implement-Sidebar-Navigation branch from 9edc20f to 8f23703 Compare April 26, 2024 14:53
@ormsbee ormsbee merged commit 3083672 into openedx:master Apr 26, 2024
67 checks passed
@openedx-webhooks
Copy link

@NiedielnitsevIvan 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

@GlugovGrGlib
Copy link
Member

This PR is the part of the following product feature - openedx/platform-roadmap#329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants