Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Django Settings to roll out extracted XBlocks #35308

Open
Tracked by #34827
kdmccormick opened this issue Aug 13, 2024 · 16 comments · May be fixed by #35549
Open
Tracked by #34827

Add Django Settings to roll out extracted XBlocks #35308

kdmccormick opened this issue Aug 13, 2024 · 16 comments · May be fixed by #35549
Assignees

Comments

@kdmccormick
Copy link
Member

kdmccormick commented Aug 13, 2024

Once a block in https://github.com/openedx/xblocks-contrib is ready, we will need to safely roll it out in edx-platform. To do that safely, we need to be able to toggle between the old and new block quickly without putting course content or user state at risk. Unfortunately, I do not think we could easily toggle between blocks in setup.py.

However, what we can do (without touching setup.py at all) is define a Django setting for each block, which toggles between the built-in class and the extracted class. Let's use the WordCloud block as an example:

  • Define a stub WordCloudBlock class in xblocks-contrib.

    • Include a class constant attribute: is_extracted = True. More on this later.
  • In lms/envs/common.py:

    • Define a Django setting for the extractable block, for example:
    # .. toggle_name: USE_EXTRACTED_WORD_CLOUD_BLOCK
    # .. toggle_description: ...
    # .. toggle_warning: This ls not production ready until <link to extraction GitHub issue> is closed
    # .. toggle_use_cases: temporary
    # .. etc.
    USE_EXTRACTED_WORD_CLOUD_BLOCK =  False
  • In the builtin XBlock definition module, for example xmodule/word_cloud_block.py:

    • At the top of the module, import the extracted block (even if it still a work-in-progress):
      from xblocks_contrib import WordCloudBlock as _ExtractedWordCloudBlock
      from django.conf import settings
    • Change class WordCloudBlock to class _BuiltInWordCloudBlock
    • In _BuiltInWordCloudBlock, add a new constant class attribute: is_extracted = False. More on this later.
    • At the end of the file, conditionally set the class to either the extracted or the builtin block class:
      WordCloudBlock = (
          _ExractedWordCloudBLock if settings.USE_EXTRACTED_WORD_CLOUD_BLOCK
          else _BuiltInWordCloudBlock
      )

Do this for all component XBlocks:

  • PollBlock
  • WordCloudBlock
  • AnnotatableBlock
  • LtiBlock
  • HtmlBlock
  • DiscussionBlock
  • ProblemBLock
  • VideoBlock

Lastly, make use of the is_extracted class attribute you added earlier by adding a custom monitoring attribute to the render_xblock view, which will show up in site operator's monitoring systems and allow them to isolate any errors and warnings which come from extracted blocks:

if (is_extracted := getattr(XBlock.load_class(usage_key.block_type), 'is_extracted') is not None:
    set_custom_attribute('block_extracted', is_extracted)

Once all of that is merged, we can toggle between each builtin block and its extracted counterpart in development. Also, edX.org can toggle it in production when they are ready, giving them the ability to test this ahead of time.

Only once an extracted block is proven to be stable, we can truly replace its built-in block:

  • in a new xblocks-contrib major version: add the entry-point to setup.py
  • in a single edx-platform PR: upgrade xblocks-contrib, remove the flag, remove the built-in block definition.
@kdmccormick
Copy link
Member Author

FYI @farhan and @feanil . Once an extracted block is ready to go, I think this is how we should roll it out. Let me know if you disagree.

@farhan
Copy link
Contributor

farhan commented Aug 16, 2024

@kdmccormick thanks for brainstorming the solution, it seems interesting to me, some points:

  1. What entry point name of Extracted WordCloud XBlock are we going to install into edx-platform? Currently I have kept the entry point same.
    a. If both entry points will be same xblock load class will give AmbigousPluginError.
    b. I think we can keep different entry point of extracted xblock at start and change it to original while removing _BuiltInWordCloudBlock. It will require pushing a new version of xblocks-contrib on pypi.
  2. In persuit of minimal edx-platform I think we shouldn't ship the non-basic xblocks out of the box. Clients could install it later. Or we can think on these lines further.

@kdmccormick
Copy link
Member Author

@farhan

  1. My idea is that there is still just one entrypoint:
       "word_cloud = xmodule.word_cloud_block:WordCloudBlock",
    The point of the waffle flag is to swap between two different values of xmodule.word_cloud_block:WordCloudBlock. When the flag is off, it gets assigned to _BuiltInWordCloudBlock. When the flag is on, it gets assigned to _ExtractedWordCloudBlock. The benefit is that site operators can switch back and forth between the old and new implementations for their existing content without redeploying their site, helping them minimize their MTTR (mean time to recovery) when they identify an issue with the new blocks.
  2. We should absolutely pursue a minimal edx-platform but I do not think that means we have to ship all builtin blocks at once. ProblemBlock and VideoBlock will each take a much longer time than the other builtin blocks. Under release early release often philosophy, we should get the simpler blocks out to production as soon as possible, for several reasons:
    • A tighter feedback loop: If there is a flaw in how we are extracting xblocks that can only be uncovered in a prod environment, we want to learn that as soon as possible.
    • Isolated releases: When there is a bug, it will be easier to identify if we are incorporating a single new block rather than a whole suite of new blocks.
    • Demonstrating value: If you can provide some new features or architectural benefit, it better to provide that as soon as possible, rather than waiting to deliver the value all at once when everything is done.
    • Maintenance: Putting code in production forces it to be in working order. It is easy to let dead or half-baked code languish when it is never being used on anyone's site.
    • Team morale: Shipping to production feels good! We should do it whenever we can.

@timmc-edx
Copy link
Contributor

Is the idea here that there is safety from data races because the waffle flag is loaded in the same transaction as the xblock DB accesses? (Are there any caches we need to worry about?)

@kdmccormick
Copy link
Member Author

@timmc-edx

Is the idea here that there is safety from data races because the waffle flag is loaded in the same transaction as the xblock DB accesses?

Nope, I hadn't thought about data races until you mentioned them. I don't think we need to worry about them, since the data model and data access patterns should be unchanged between the built-in and extracted block implementations. Same with the caches-- if we have done our work right, it will just be a different class accessing the same caches in the same exact way.

@kdmccormick kdmccormick changed the title Experimentally enable an XBlock from xblocks-contrib Add Waffle flags to roll out extracted XBlocks Aug 22, 2024
@kdmccormick
Copy link
Member Author

kdmccormick commented Aug 22, 2024

FYI @farhan @robrap -- I had originally said that we need CourseWaffleFlags so that individual course runs could switch back and forth between the blocks. I was wrong about that: the set-up I describe would globally toggle between the extracted block and the builtin block, so we should just use regular WaffleFlags.

Toggling between block classes per course run would be significantly more complex to implement and, in my opinion, would introduce operational risks in itself. Standard WaffleFlags would still permit operators to use user-by-user rollout or percentage-based rollout, so I think they are sufficient.

@robrap
Copy link
Contributor

robrap commented Aug 22, 2024

@kdmccormick: That makes sense. Agreed that percent roll-out is what is most important. Could you also add one (or more) custom attributes that would be useful for a dashboard to show which is being used? See openedx/edx-django-utils#328.

@kdmccormick
Copy link
Member Author

@robrap as we discussed, I've added a note for the team to set a custom attribute. The key will be block_extracted and the values will be True, False, or None (if N/A). Let me know if any of that seems off.

@robrap
Copy link
Contributor

robrap commented Aug 29, 2024

Instead of a special value like None, I think the attribute name (or key) should not be added if not needed. Does that makes sense?

@robrap
Copy link
Contributor

robrap commented Aug 30, 2024

@kdmccormick:

  1. Repeating my last comment with different words: the custom attribute block_extracted should only have values of True/False, and should not be added at all where not applicable.
  2. We noted in the meeting the existing 2 xblock-related custom attributes that would also be useful for observability. What were these again?
  3. We also discussed a process for overridden XBlocks that may need to reuse your waffle flag in order to re-implement the same conditional (and possibly the new custom attribute).
  4. Will you eventually turn all of this into a document for rollout? Could this apply to other orgs that might have overridden XBlocks?

FYI: @nsprenkle: Regarding our overridden XBlocks.

@kdmccormick
Copy link
Member Author

@robrap

  1. Yep, understood, I had True-False-None because it was one less line of code, but since it sounds like True-False-N/A is actually better for monitoring, I've updated the ticket to reflect that.
  2. block_type and usage_key
  3. and 4. Yep, when first flag is ready to be used, I'll put together a doc. The XBlock overrides feature is brand new (landed last month) but it's possible that orgs will start using it as soon as Sumac is released.

@robrap
Copy link
Contributor

robrap commented Sep 4, 2024

Thanks @kdmccormick.

  1. Yep, understood, I had True-False-None because it was one less line of code, but since it sounds like True-False-N/A is actually better for monitoring, I've updated the ticket to reflect that.

I hadn't realized you had an implementation and that the former was simpler. For monitoring, really anything that works is fine. Feel free to do whichever, and I can verify that it works as planned (e.g. we can easily tell the difference between the 3 cases, or by default, at least 2 of the cases).

@farhan farhan self-assigned this Sep 12, 2024
@farhan farhan linked a pull request Oct 2, 2024 that will close this issue
8 tasks
@farhan farhan linked a pull request Oct 14, 2024 that will close this issue
8 tasks
@kdmccormick
Copy link
Member Author

kdmccormick commented Nov 22, 2024

@robrap We've had to adjust the acceptance criteria here a bit. Rather than using Waffle flags, we'll be using Django settings. The reason for the change is that Waffle flags require Model access, which Django does not make available until after all INSTALLED_APPS are loaded; but, loading INSTALLED_APPS causes the xmodule/ modules to be imported, which invoked the Waffle flags, thus creating a circular dependency. Django settings, on the other hand, do not have this limitation; they are available to use very early in the Django initialization process.

@kdmccormick kdmccormick changed the title Add Waffle flags to roll out extracted XBlocks Add Django Settings to roll out extracted XBlocks Nov 22, 2024
@robrap
Copy link
Contributor

robrap commented Nov 22, 2024

From earlier:

Standard WaffleFlags would still permit operators to use user-by-user rollout or percentage-based rollout, so I think they are sufficient.

Last message:

Rather than using Waffle flags, we'll be using Django settings.

Although this may be for good reasons, I'm simply clarifying the consequence that user-by-user or percentage-based rollout will no longer be available, and this will now be an all-or-nothing configuration (presumably per block?).

That said - I stepped into this conversation: 1) to provide some monitoring tips, and 2) because I'm a busybody that gets myself involved in other people's business. :) I'm going to drop from this ticket, in favor of someone at 2U who owns this. I will make sure someone is aware of this. Thank you @kdmccormick. (Of course if there are questions that require my specific expertise, please feel free to ping me.)

@kdmccormick
Copy link
Member Author

Although this may be for good reasons, I'm simply clarifying the consequence that user-by-user or percentage-based rollout will no longer be available, and this will now be an all-or-nothing configuration (presumably per block?).

Yes, exactly right.

@robrap
Copy link
Contributor

robrap commented Nov 22, 2024

Final inform: TNL is generally tracking this and not seeing any issues. Thanks again @kdmccormick, and good luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging a pull request may close this issue.

4 participants