Skip to content

Commit

Permalink
Merge pull request #870 from edx/robrap/code-owner-monitoring-phase-2
Browse files Browse the repository at this point in the history
feat: phase 2 of code owner monitoring rollout
  • Loading branch information
robrap authored Dec 11, 2024
2 parents b02eb4c + a96686d commit 138ed0e
Show file tree
Hide file tree
Showing 15 changed files with 289 additions and 369 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,19 @@ Change Log
Unreleased
~~~~~~~~~~

[6.1.0] - 2024-12-10
~~~~~~~~~~~~~~~~~~~~
Changed
_______
* Completes code owner monitoring updates, which drops owner theme and finalizes the code owner span tags. See doc and ADR updates for more details.

* The code_owner_theme_2 tag was dropped altogether.
* The temporary suffix (_2) was removed from other span tags.
* The code_owner (formerly code_owner_2) tag no longer includes the theme name.
* The new name for the django setting is CODE_OWNER_TO_PATH_MAPPINGS (formerly CODE_OWNER_MAPPINGS).
* The django setting CODE_OWNER_THEMES was dropped.
* Updates the generate_code_owner_mappings.py script accordingly.

[6.0.0] - 2024-12-05
~~~~~~~~~~~~~~~~~~~~
Removed
Expand Down
2 changes: 1 addition & 1 deletion edx_arch_experiments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
A plugin to include applications under development by the architecture team at 2U.
"""

__version__ = '6.0.0'
__version__ = '6.1.0'
4 changes: 2 additions & 2 deletions edx_arch_experiments/datadog_monitoring/README.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Datadog Monitoring
###################

When installed in the LMS as a plugin app, the ``datadog_monitoring`` app adds additional monitoring.
When installed in the LMS as a plugin app, the ``datadog_monitoring`` app adds additional 2U-specific monitoring.

This is where our code_owner_2 monitoring code lives, for example.
This is where our code owner monitoring code lives, for example.
4 changes: 2 additions & 2 deletions edx_arch_experiments/datadog_monitoring/code_owner/datadog.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
Datadog span processor for celery span code owners.
"""
from .utils import set_code_owner_attribute_from_module
from .utils import set_code_owner_span_tags_from_module


class CeleryCodeOwnerSpanProcessor:
Expand All @@ -17,7 +17,7 @@ def on_span_start(self, span):
# We can use this for celery spans, because the resource name is more predictable
# and available from the start. For django requests, we'll instead continue to use
# django middleware for setting code owner.
set_code_owner_attribute_from_module(span.resource)
set_code_owner_span_tags_from_module(span.resource)

def on_span_finish(self, span):
pass
Expand Down
183 changes: 42 additions & 141 deletions edx_arch_experiments/datadog_monitoring/code_owner/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Utilities for monitoring code_owner_2
Utilities for monitoring code_owner
"""
import logging

Expand All @@ -19,9 +19,6 @@ def get_code_owner_from_module(module):
this lookup would match on 'openedx.features.discounts' before
'openedx.features', because the former is more specific.
See how to:
https://github.com/openedx/edx-django-utils/blob/master/edx_django_utils/monitoring/docs/how_tos/add_code_owner_custom_attribute_to_an_ida.rst
"""
if not module:
return None
Expand Down Expand Up @@ -54,7 +51,7 @@ def is_code_owner_mappings_configured():

def get_code_owner_mappings():
"""
Returns the contents of the CODE_OWNER_MAPPINGS Django Setting, processed
Returns the contents of the CODE_OWNER_TO_PATH_MAPPINGS Django Setting, processed
for efficient lookup by path.
Returns:
Expand All @@ -81,12 +78,12 @@ def get_code_owner_mappings():
# processed map. Worst case, it is processed more than once at start-up.
path_to_code_owner_mapping = {}

# .. setting_name: CODE_OWNER_MAPPINGS
# .. setting_name: CODE_OWNER_TO_PATH_MAPPINGS
# .. setting_default: None
# .. setting_description: Used for monitoring and reporting of ownership. Use a
# dict with keys of code owner name and value as a list of dotted path
# module names owned by the code owner.
code_owner_mappings = getattr(settings, 'CODE_OWNER_MAPPINGS', None)
code_owner_mappings = getattr(settings, 'CODE_OWNER_TO_PATH_MAPPINGS', None)
if code_owner_mappings is None:
return None

Expand All @@ -97,72 +94,73 @@ def get_code_owner_mappings():
path_to_code_owner_mapping[path] = code_owner
except TypeError as e:
log.exception(
'Error processing CODE_OWNER_MAPPINGS. {}'.format(e) # pylint: disable=logging-format-interpolation
'Error processing CODE_OWNER_TO_PATH_MAPPINGS. {}'.format(e) # pylint: disable=logging-format-interpolation
)
raise e

_PATH_TO_CODE_OWNER_MAPPINGS = path_to_code_owner_mapping
return _PATH_TO_CODE_OWNER_MAPPINGS


def set_code_owner_attribute_from_module(module):
def set_code_owner_span_tags_from_module(module):
"""
Updates the code_owner_2 and code_owner_2_module custom attributes.
Celery tasks or other non-web functions do not use middleware, so we need
an alternative way to set the code_owner_2 custom attribute.
Note: These settings will be overridden by the CodeOwnerMonitoringMiddleware.
This method can't be used to override web functions at this time.
Updates the code_owner and code_owner_module custom span tags.
Usage::
set_code_owner_2_attribute_from_module(__name__)
set_code_owner_span_tags_from_module(__name__)
"""
set_custom_attribute('code_owner_2_module', module)
# .. custom_attribute_name: code_owner_module
# .. custom_attribute_description: The module used to determine the code_owner. This can
# be useful for debugging issues for missing code owner span tags.
set_custom_attribute('code_owner_module', module)
code_owner = get_code_owner_from_module(module)

if code_owner:
set_code_owner_custom_attributes(code_owner)
set_code_owner_custom_span_tags(code_owner)


def set_code_owner_custom_attributes(code_owner):
def set_code_owner_custom_span_tags(code_owner):
"""
Sets custom metrics for code_owner_2, code_owner_2_theme, and code_owner_2_squad
Sets custom span tags for code_owner, and code_owner_squad
"""
if not code_owner: # pragma: no cover
return
set_custom_attribute('code_owner_2', code_owner)
theme = _get_theme_from_code_owner(code_owner)
if theme:
set_custom_attribute('code_owner_2_theme', theme)
squad = _get_squad_from_code_owner(code_owner)
if squad:
set_custom_attribute('code_owner_2_squad', squad)

# .. custom_attribute_name: code_owner
# .. custom_attribute_description: The squad owner name for the tagged span.
set_custom_attribute('code_owner', code_owner)
# .. custom_attribute_name: code_owner_squad
# .. custom_attribute_description: Deprecated code_owner_squad is now redundant
# to the code_owner span tag.
set_custom_attribute('code_owner_squad', code_owner)

# .. custom_attribute_name: code_owner_plugin
# .. custom_attribute_description: This is a temporary span tag to roll out the
# the switch from edx-django-utils to this plugin. If this span tag is True,
# the plugin has added the above custom span tags (possibly in addition to
# edx-django-utils). If the code_owner_theme span tag is also seen, then
# edx-django-utils is also adding these span tags. If not, only the plugin is
# creating these tags.
set_custom_attribute('code_owner_plugin', True)

def set_code_owner_attribute(request):

def set_code_owner_span_tags_from_request(request):
"""
Sets the code_owner_2 custom attribute for the request.
Sets the code_owner custom span tag for the request.
"""
code_owner = None
module = _get_module_from_request(request)
if module:
code_owner = get_code_owner_from_module(module)

if code_owner:
set_code_owner_custom_attributes(code_owner)
set_code_owner_span_tags_from_module(module)


def _get_module_from_request(request):
"""
Get the module from the request path or the current transaction.
Side-effects:
Sets code_owner_2_module custom attribute, used to determine code_owner_2.
If module was not found, may set code_owner_2_path_error custom attribute
if applicable.
Sets code_owner_path_error custom span tag if applicable.
Returns:
str: module name or None if not found
Expand All @@ -172,14 +170,14 @@ def _get_module_from_request(request):
return None

module, path_error = _get_module_from_request_path(request)
if module:
set_custom_attribute('code_owner_2_module', module)
return module

# monitor errors if module was not found
if path_error:
set_custom_attribute('code_owner_2_path_error', path_error)
return None
# .. custom_attribute_name: code_owner_path_error
# .. custom_attribute_description: Error details if the module can't be found. This can
# be useful for debugging issues for missing code owner span tags.
set_custom_attribute('code_owner_path_error', path_error)

return module


def _get_module_from_request_path(request):
Expand All @@ -204,100 +202,3 @@ def clear_cached_mappings():
"""
global _PATH_TO_CODE_OWNER_MAPPINGS
_PATH_TO_CODE_OWNER_MAPPINGS = None
global _CODE_OWNER_TO_THEME_AND_SQUAD_MAPPINGS
_CODE_OWNER_TO_THEME_AND_SQUAD_MAPPINGS = None


# Cached lookup table for code owner theme and squad given a code owner.
# - Although code owner is "theme-squad", a hyphen may also be in the theme or squad name, so this ensures we get both
# correctly from config.
# Do not access this directly, but instead use get_code_owner_theme_squad_mappings.
_CODE_OWNER_TO_THEME_AND_SQUAD_MAPPINGS = None


def get_code_owner_theme_squad_mappings():
"""
Returns the contents of the CODE_OWNER_THEMES Django Setting, processed
for efficient lookup by path.
Returns:
(dict): dict mapping code owners to a dict containing the squad and theme, or
an empty dict if there are no configured mappings.
Example return value::
{
'theme-x-team-red': {
'theme': 'theme-x',
'squad': 'team-red',
},
'theme-x-team-blue': {
'theme': 'theme-x',
'squad': 'team-blue',
},
}
"""
global _CODE_OWNER_TO_THEME_AND_SQUAD_MAPPINGS

# Return cached processed mappings if already processed
if _CODE_OWNER_TO_THEME_AND_SQUAD_MAPPINGS is not None:
return _CODE_OWNER_TO_THEME_AND_SQUAD_MAPPINGS

# Uses temporary variable to build mappings to avoid multi-threading issue with a partially
# processed map. Worst case, it is processed more than once at start-up.
code_owner_to_theme_and_squad_mapping = {}

# .. setting_name: CODE_OWNER_THEMES
# .. setting_default: None
# .. setting_description: Used for monitoring and reporting of ownership. Use a
# dict with keys of code owner themes and values as a list of code owner names
# including theme and squad, separated with a hyphen.
code_owner_themes = getattr(settings, 'CODE_OWNER_THEMES', {})

try:
for theme in code_owner_themes:
code_owner_list = code_owner_themes[theme]
for code_owner in code_owner_list:
squad = code_owner.split(theme + '-', 1)[1]
code_owner_details = {
'theme': theme,
'squad': squad,
}
code_owner_to_theme_and_squad_mapping[code_owner] = code_owner_details
except TypeError as e:
log.exception(
'Error processing CODE_OWNER_THEMES setting. {}'.format(e) # pylint: disable=logging-format-interpolation
)
raise e

_CODE_OWNER_TO_THEME_AND_SQUAD_MAPPINGS = code_owner_to_theme_and_squad_mapping
return _CODE_OWNER_TO_THEME_AND_SQUAD_MAPPINGS


def _get_theme_from_code_owner(code_owner):
"""
Returns theme for a code_owner (e.g. 'theme-my-squad' => 'theme')
"""
mappings = get_code_owner_theme_squad_mappings()
if mappings is None: # pragma: no cover
return None

if code_owner in mappings:
return mappings[code_owner]['theme']

return None


def _get_squad_from_code_owner(code_owner):
"""
Returns squad for a code_owner (e.g. 'theme-my-squad' => 'my-squad')
"""
mappings = get_code_owner_theme_squad_mappings()
if mappings is None: # pragma: no cover
return None

if code_owner in mappings:
return mappings[code_owner]['squad']

return None
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Monitoring by Code Owner
************************

Status
======

Accepted

Context
=======

It is difficult for different teams to have team-based on-calls rotations, alerting and monitoring for various parts of the edx-platform (specifically LMS).

* The original decision to add code_owner custom span tags (custom attributes) was documented in `edx-platform in 0001-monitoring-by-code-owner`_.
* The decision to move the code for reuse across IDAs was documented in `edx-django-utils in 0001-monitoring-by-code-owner.rst`_.
* The decision for how to implement code owner details for celery tasks was documented in `0003-code-owner-for-celery-tasks_, and was limited by New Relic's instrumentation.
* The decision to break up the ``code_owner`` custom span tag (custom attribute) into ``code_owner_squad`` and ``code_owner_theme`` tags was documented in `0004-code-owner-theme-and-squad`_.

Some changes or clarifications since this time:

* It turns out that this functionality is only really useful for the edx-platform LMS. Our other services (IDAs) are small enough to keep to a single owner, or to solve monitoring issues in other ways.
* It is likely that the ``code_owner`` code is only really needed by 2U.
* 2U wants to drop owner themes from its code_owner custom span tag.
* 2U has switched to Datadog, which has slightly different capabilities from New Relic.

* Note that Datadog has custom span tags, where New Relic has custom attributes to refer to its tagging capabilities.

.. _edx-platform in 0001-monitoring-by-code-owner: https://github.com/openedx/edx-platform/blob/f29e418264f374099930a5b1f5b8345c569892e9/lms/djangoapps/monitoring/docs/decisions/0001-monitoring-by-code-owner.rst
.. _edx-django-utils in 0001-monitoring-by-code-owner.rst: https://github.com/openedx/edx-django-utils/blob/a1a1ec95d7c1d4767deb578748153c99c9562a04/edx_django_utils/monitoring/docs/decisions/0001-monitoring-by-code-owner.rst
.. _0003-code-owner-for-celery-tasks: https://github.com/openedx/edx-django-utils/blob/a1a1ec95d7c1d4767deb578748153c99c9562a04/edx_django_utils/monitoring/docs/decisions/0003-code-owner-for-celery-tasks.rst
.. _0004-code-owner-theme-and-squad: https://github.com/openedx/edx-django-utils/blob/a1a1ec95d7c1d4767deb578748153c99c9562a04/edx_django_utils/monitoring/docs/decisions/0004-code-owner-theme-and-squad.rst

Decision
========

2U has moved its code owner monitoring implementation to the datadog_monitoring plugin.

* The owner theme name has been dropped from the ``code_owner`` custom span tag value in this new implementation.
* The ``code_owner_theme`` span tag has been dropped altogether from this new implementation.
* The now deprecated ``code_owner_squad`` span tag, which is redundant with the updated ``code_owner`` tag, will continue to be supported for backward compatibility.
* A Datadog span processor was used to add the code owner span tags for celery tasks, so there is no longer a need for a special decorator on each celery task.
* A new capability added to edx-django-utils to add `monitoring signals for plugins`_ is used to monitor Django requests.

Also, a new search script was implemented in this repository: `search_datadog.rst`_.

.. _monitoring signals for plugins: https://github.com/openedx/edx-django-utils/pull/467
.. _search_datadog.rst: https://github.com/edx/edx-arch-experiments/blob/main/edx_arch_experiments/datadog_monitoring/scripts/datadog_search.py

Consequences
============

- In addition to having greater flexibility to update these custom tags as-needed for 2U, we can also DEPR the code owner functionality in the Open edX codebase, where it is not likely to be needed.
- Spreadsheet changes will no longer be required when a squad moves from one part of the organization to another (e.g. changes themes).
- However, without including themes, it may take additional time to learn about a squad's place in the organization when seen in the code_owner span tag. For example, it will not be as immediately clear when dealing with an enterprise squad, unless you are familiar with all of the squad names.
Loading

0 comments on commit 138ed0e

Please sign in to comment.