-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with hqcases removal #35766
base: master
Are you sure you want to change the base?
Conversation
DebugMetrics was updated to call `bucket_value()` for histogram metrics. This should have no effect on production metrics.
83de59a
to
3f3b9d2
Compare
@CodeRabbit please review |
🧩 Analysis chain@millerdev I'll review the changes. I'll pay special attention to:
🏁 Script executed:
Length of output: 48590 🏁 Script executed:
Length of output: 1456 🏁 Script executed:
Length of output: 127 🏁 Script executed:
Length of output: 33432 🏁 Script executed:
Length of output: 3343 @millerdev I've reviewed the changes. The implementation looks solid with good safety measures in place:
The implementation looks good to proceed. The safety measures and gradual rollout capability make it a low-risk change. ✏️ Learnings added
✅ Actions performedComments resolved. Review triggered.
|
WalkthroughThe changes introduce a new experiments framework within the application. A new app is added, providing a framework for comparing old and new code paths, along with documentation and examples in the corresponding README files. Core components include an Sequence Diagram(s)sequenceDiagram
participant Caller
participant ExperimentDecorator
participant OldCodePath
participant NewCodePath
participant MetricsRecorder
participant ExperimentEnabler
Caller ->> ExperimentDecorator: Call decorated function
ExperimentDecorator ->> ExperimentEnabler: Check experiment configuration
alt New Code Path Enabled
ExperimentDecorator ->> NewCodePath: Execute new code segment
NewCodePath -->> ExperimentDecorator: Return result & timing
else Old Code Path
ExperimentDecorator ->> OldCodePath: Execute old code segment
OldCodePath -->> ExperimentDecorator: Return result & timing
end
ExperimentDecorator ->> MetricsRecorder: Record metrics and errors
ExperimentDecorator -->> Caller: Return final result
Suggested labels
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (11)
corehq/apps/experiments/models.py (1)
9-32
: Validate or document the meaning of special values inenabled_percent
.
Your help text explains certain “magic” values (-1, 101, 102), which might surprise new contributors. Consider clarifying or validating them more strictly—for instance, through anEnum
or a custom validator that ensures valid usage.corehq/apps/experiments/experiment.py (1)
124-130
: Includestacklevel
forwarnings.warn
.
Consider addingstacklevel=2
or higher so that the warning references the calling site rather than the experiment code itself. This helps locate duplicate experiment definitions quickly.- warnings.warn(f"Duplicate experiment {key} will result in conflated ...") + warnings.warn( + f"Duplicate experiment {key} will result in conflated ...", + stacklevel=2 + )🧰 Tools
🪛 Ruff (0.8.2)
127-127: No explicit
stacklevel
keyword argument found(B028)
corehq/apps/experiments/tests/test_experiment.py (2)
52-72
: Consider making timing tests more robust.The current timing test uses fixed sleep durations which could be flaky in CI environments. Consider:
- Using relative comparisons instead of exact durations
- Adding tolerance for timing variations
- sleep2() + # Run multiple times to account for timing variations + results = [sleep2() for _ in range(5)] + # Assert that the average duration is within expected range + avg_duration = sum(results) / len(results) + assert 0.001 <= avg_duration <= 0.003
179-188
: Add docstring to gc_disabled context manager.The context manager's purpose and usage should be documented.
@contextmanager def gc_disabled(): + """Temporarily disable garbage collection to prevent interference with timing measurements. + + This context manager is particularly useful in timing-sensitive tests where GC + could introduce unpredictable delays. + """ enabled = gc.isenabled() if enabled: gc.disable()corehq/util/metrics/metrics.py (1)
125-128
: Consider adding type hints and docstring to the histogram method.The method would benefit from explicit type hints and documentation of its parameters and behavior.
- def histogram(self, name, value, bucket_tag, buckets, bucket_unit, tags, **kw): - bucket_tags = {bucket_tag: bucket_value(value, buckets, bucket_unit)} - self._histogram(name, 1, tags=(tags or {}) | bucket_tags, **kw) + def histogram( + self, + name: str, + value: float, + bucket_tag: str, + buckets: List[float] = DEFAULT_BUCKETS, + bucket_unit: str = '', + tags: Dict[str, str] = None, + **kw + ) -> None: + """Record a histogram metric with computed bucket tags. + + Args: + name: The name of the metric + value: The value to record + bucket_tag: The tag name for the bucket + buckets: List of bucket boundaries + bucket_unit: Optional unit for bucket values + tags: Additional tags for the metric + **kw: Additional keyword arguments + """ + bucket_tags = {bucket_tag: bucket_value(value, buckets, bucket_unit)} + self._histogram(name, 1, tags=(tags or {}) | bucket_tags, **kw)corehq/apps/data_pipeline_audit/dbacessors.py (2)
33-41
: Consider adding error handling for edge cases.While the function handles basic validation through
is_equal
, consider adding error handling for:
- Empty or invalid date ranges
- Invalid doc_type values
@experiment(is_equal=lambda a, b: abs(a - b) < 5) def get_es_case_counts(domain, doc_type, gte, lt, escls=es.CaseES): + if not doc_type: + raise ValueError("doc_type is required") + if gte and lt and gte >= lt: + raise ValueError("'gte' must be less than 'lt'") ex = es.cases.server_modified_range(gte=gte, lt=lt) return escls().domain(domain).filter(ex).filter( es.filters.OR( es.filters.doc_type(doc_type), es.filters.doc_type(doc_type.lower()), ) ).count()
44-56
: Consider caching the results for performance.The function makes two separate ES queries. Consider caching the results to improve performance for repeated calls.
+from django.core.cache import cache +from datetime import timedelta + @experiment def get_es_case_range(domain, escls=es.CaseES): + cache_key = f"es_case_range_{domain}" + cached_result = cache.get(cache_key) + if cached_result: + return cached_result + def descending_query(order): result = escls().domain(domain).sort( 'server_modified_on', desc=order).size(1).run().raw_hits if len(result) == 0: return None else: return dateutil.parser.parse(result[0]['_source']['server_modified_on']).date() - return ( + result = ( descending_query(order=False), descending_query(order=True) ) + cache.set(cache_key, result, timeout=timedelta(hours=1).seconds) + return resultcorehq/apps/experiments/tests/test_enabler.py (1)
228-228
: Consider using underscore for unused loop variable.The loop variable
x
is not used within the loop body.- for x in range(300): + for _ in range(300):🧰 Tools
🪛 Ruff (0.8.2)
228-228: Loop control variable
x
not used within loop bodyRename unused
x
to_x
(B007)
corehq/apps/experiments/README.md (2)
11-25
: Consider adding caching duration details.The features section comprehensively covers the key aspects of the framework. However, it would be helpful to specify the exact caching duration for experiment states (mentioned as "may not apply immediately" on line 25).
- Django Admin may not apply immediately. + Django Admin may not apply immediately (changes can take up to 30 minutes to take effect due to caching).
32-44
: Enhance usage example with result comparison.The usage example demonstrates the basic setup but could be more comprehensive.
Consider extending the example to show:
- How to customize result comparison using
is_equal
- How to handle the case when results differ
from corehq.apps.experiments import Experiment, MOON_SHOT def compare_results(old_result, new_result): return old_result.get('key') == new_result.get('key') experiment = Experiment( campaign=MOON_SHOT, old_args={"feature": "valley"}, new_args={"feature": "crater"}, is_equal=compare_results # Custom comparison function ) @experiment def function(other, args, *, feature): ... # implementationcorehq/apps/README.rst (1)
39-40
: Consider expanding the description and moving to "Engineering Apps" section.The experiments framework appears to be more of a developer-facing tool than a primary application feature. Consider:
- Moving it to the "Engineering Apps" section (around line 154)
- Expanding the description to highlight its key features
-experiments - Framework for comparing old and new code paths. +experiments + Developer tooling for comparing old and new code paths, featuring performance metrics + collection (Datadog), result comparison (Sentry), and configurable rollout through + Django Admin. Used for validating implementation changes before full adoption.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
migrations.lock
is excluded by!**/*.lock
and included by none
📒 Files selected for processing (14)
corehq/apps/README.rst
(1 hunks)corehq/apps/data_pipeline_audit/dbacessors.py
(3 hunks)corehq/apps/domain/tests/test_deletion_models.py
(1 hunks)corehq/apps/dump_reload/tests/test_dump_models.py
(1 hunks)corehq/apps/experiments/README.md
(1 hunks)corehq/apps/experiments/__init__.py
(1 hunks)corehq/apps/experiments/admin.py
(1 hunks)corehq/apps/experiments/experiment.py
(1 hunks)corehq/apps/experiments/migrations/0001_initial.py
(1 hunks)corehq/apps/experiments/models.py
(1 hunks)corehq/apps/experiments/tests/test_enabler.py
(1 hunks)corehq/apps/experiments/tests/test_experiment.py
(1 hunks)corehq/util/metrics/metrics.py
(3 hunks)settings.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- corehq/apps/experiments/init.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: - Review the code following best practises and st...
**/*.py
: - Review the code following best practises and standards
corehq/apps/experiments/admin.py
corehq/apps/dump_reload/tests/test_dump_models.py
corehq/apps/domain/tests/test_deletion_models.py
corehq/apps/experiments/experiment.py
corehq/util/metrics/metrics.py
corehq/apps/experiments/migrations/0001_initial.py
corehq/apps/experiments/tests/test_enabler.py
settings.py
corehq/apps/experiments/models.py
corehq/apps/data_pipeline_audit/dbacessors.py
corehq/apps/experiments/tests/test_experiment.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: millerdev
PR: dimagi/commcare-hq#0
File: :0-0
Timestamp: 2025-02-12T22:11:12.510Z
Learning: The experiments framework in CommCare HQ allows comparison of old and new code paths with the following features:
1. Timing metrics sent to Datadog
2. Result differences logged to Sentry
3. Configurable through ExperimentEnabler records in Django Admin
4. Disabled by default with gradual rollout capability (1-100%)
corehq/apps/experiments/README.md (1)
Learnt from: millerdev
PR: dimagi/commcare-hq#0
File: :0-0
Timestamp: 2025-02-12T22:11:12.510Z
Learning: The experiments framework in CommCare HQ allows comparison of old and new code paths with the following features:
1. Timing metrics sent to Datadog
2. Result differences logged to Sentry
3. Configurable through ExperimentEnabler records in Django Admin
4. Disabled by default with gradual rollout capability (1-100%)
🪛 Ruff (0.8.2)
corehq/apps/experiments/experiment.py
127-127: No explicit stacklevel
keyword argument found
(B028)
corehq/apps/experiments/tests/test_enabler.py
228-228: Loop control variable x
not used within loop body
Rename unused x
to _x
(B007)
🔇 Additional comments (22)
corehq/apps/experiments/models.py (4)
34-44
: Correctness check for random-based logic.
is_enabled
uses a random comparison for 0 < value < 100, returning True occasionally and False otherwise. This is correct for a partial rollout. Just ensure that other code paths handle theNone
case properly (for fully new code usage).
46-52
: Check boundary conditions for metrics recording.
should_record_metrics
disallows recording whenenabled_percent
is -1 or 102, aligning with your help text. This looks intentional. Just confirm that you want to fully skip metrics in both negative and “fully new” disable-metrics states.
54-59
: Nice fallback mechanism.
The fallback logic correctly narrows the path by chopping off segments. This design is flexible and matches partial or full path-based enablers. No issues found.
64-75
: Caching approach is efficient.
Leveragingquickcache
with a 30-minute memoize timeout is appropriate here, ensuring minimal database load. No issues observed.corehq/apps/experiments/experiment.py (5)
13-38
: Fields are well structured withattrs
.
Your usage ofattrs
for theExperiment
class and default fields (liketime_buckets
,percent_buckets
) is clean. This helps avoid boilerplate code and is easy to read.
39-45
: Use of positional-only arguments.
def __call__(self, func=None, /, **kw): ...
is valid modern Python syntax. Just confirm all environments support this. If there are Python 3.7 or older environments, consider adjusting.
50-123
: Comprehensive logic in the wrapper.
Your code effectively captures and logs timing data for both old and new paths, handles partial rollouts, and deals with exceptions. The usage ofnotify_exception
andnotify_error
is appropriate. The fallback toover_the_top
prevents division by zero.
132-138
: Helpful utility for debugging.
describe
function is a neat tool for summarizing arguments in logs. The truncation of long representations is a nice touch for preventing log noise.
140-145
: Clarity inENABLED_TAG
mapping.
The distinct tags for old/both/new states inENABLED_TAG
cleanly reflect the experiment path states.corehq/apps/experiments/admin.py (1)
6-9
: Well-organized admin configuration.
Displayingcampaign
,path
, andenabled_percent
is sufficient for quick management. Ordering is also a good user experience detail.corehq/apps/experiments/migrations/0001_initial.py (1)
1-28
: LGTM! Well-structured migration with clear field definitions and constraints.The migration follows Django best practices with:
- Descriptive help text for each field
- Appropriate field types and lengths
- Proper validation constraints
- Unique constraint to prevent duplicates
corehq/apps/domain/tests/test_deletion_models.py (1)
59-59
: LGTM! Appropriate addition to IGNORE_MODELS.The ExperimentEnabler model is correctly excluded from domain deletion as it's a configuration model.
corehq/apps/data_pipeline_audit/dbacessors.py (3)
14-18
: LGTM! Well-structured experiment configuration.The experiment setup is clear and follows best practices:
- Campaign name is descriptive
- Default args are empty for old path
- New args specify the new ES class
- Ignore duplicate flag is set for safety
170-175
: LGTM! Clean implementation of get_es_case_ids.The function maintains compatibility with both ES classes while adding experiment support.
21-30
: Verify the impact of replacing CaseES with CaseSearchES.The function correctly handles the ES class replacement in the indices list, but we should verify that CaseSearchES supports all the query features used by this function.
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Check if CaseSearchES supports all query methods used with CaseES ast-grep --pattern 'es.CaseES().$_($$$)'Length of output: 42
Manual Verification Required: Verify Query Method Compatibility Between CaseES and CaseSearchES
The initial ast‑grep query produced no output, so its result is inconclusive. To better establish that all query methods expected by the function (e.g., those used within
_get_index_counts_by_domain_doc_type
) work with CaseSearchES, please run some additional searches. For example, you can check for usage of bothes.CaseES(
andCaseSearchES
in the codebase:These results will help confirm that the replacement is safe—that is, if the same query methods are used with both classes, and CaseSearchES implements them compatible with CaseES, the change should have no adverse effects.
Please verify these findings manually and confirm that CaseSearchES fully supports the query features expected by the function.
🏁 Script executed:
[run_scripts] #!/bin/bash echo "***** Searching for usages of es.CaseES( *****" rg "es\.CaseES\(" -n . echo "***** Searching for usages of CaseSearchES *****" rg "CaseSearchES" -n .Length of output: 17168
Query Method Compatibility Verified: CaseSearchES Supports All Required Features
Based on the codebase search results:
- Inheritance Confirmation: In
corehq/apps/es/case_search.py
, the classCaseSearchES
is defined as a subclass ofCaseES
, inheriting its query methods.- Extensive Usage: Multiple tests and modules (e.g., in
corehq/apps/es/tests/test_case_search_es.py
and others) useCaseSearchES
with typical query operations (like.domain()
,.filter()
,.run()
, etc.) identical to those used withCaseES
.- Integrated Replacement: The replacement in
corehq/apps/data_pipeline_audit/dbacessors.py
correctly handles the ES class list, and the widespread usage ofCaseSearchES
indicates that it supports all the necessary query features used by the function.No discrepancies were identified that would impact the query functionality when replacing
CaseES
withCaseSearchES
.corehq/apps/dump_reload/tests/test_dump_models.py (1)
69-69
: LGTM! Appropriate addition to IGNORE_MODELS.The ExperimentEnabler model is correctly added to the ignore list since it's a configuration model that shouldn't be included in domain dumps.
corehq/apps/experiments/tests/test_enabler.py (4)
20-31
: LGTM! Well-structured test fixture.The
enabled
fixture provides a clean way to manage experiment state during tests:
- Creates ExperimentEnabler with specified parameters
- Cleans up after tests using yield
- Clears cache to prevent test interference
34-181
: LGTM! Comprehensive test coverage for experiment execution.The TestExperimentEnabled class thoroughly tests:
- Disabled experiments with metrics disabled/enabled
- Error handling in both paths
- Metric collection and logging
- Different enabling percentages
192-288
: LGTM! Thorough testing of experiment enabling logic.The TestIsEnabled class effectively tests:
- Various enabling percentages (0%, 1%, 49%, 75%, 100%)
- Caching behavior
- Parent package configuration
- Path-specific enabling
🧰 Tools
🪛 Ruff (0.8.2)
228-228: Loop control variable
x
not used within loop bodyRename unused
x
to_x
(B007)
290-307
: LGTM! Well-designed test helper function.The make_func helper:
- Creates a test function with configurable error injection
- Tracks function calls for verification
- Adds controlled delays for timing tests
settings.py (1)
275-275
: LGTM! Appropriate addition to HQ_APPS.The experiments app is correctly added to the HQ_APPS tuple, making it available throughout the application.
corehq/apps/experiments/README.md (1)
1-8
: LGTM! Clear introduction and attribution.The introduction effectively explains the purpose of the framework and properly attributes inspiration to Github Scientist.
Make it easy to see in Datadog if errors should be investigated in Sentry. The "notify" tag will have a value of "error" or "diff" if an error or result difference was logged to Sentry, otherwise "none".
adaa784
to
73dcecd
Compare
This PR adds a new
Experiment
decorator to compare timing and result differences between an old code path and a new one. Timing metrics are sent to the metrics backend (datadog), and result differences are logged to Sentry.Experiments can be dynamically enabled and disabled by adding
ExperimentEnabler
records in Django Admin. Changes in Django Admin may take up to 30 minutes to become effective due to caching.See the README for more details.
The final commit in this PR adds experiments to evaluate switching from
CaseES
toCaseSearchES
in thedata_pipeline_audit
app. These should be low-traffic experiments, and will be used to test integrations with Datadog and Sentry.🐟 Review by commit.
Safety Assurance
Safety story
All experiments are disabled by default, but timing metrics will be recorded for the old code path immediately after the code is deployed.
Experiments have been applied to a few dbaccessor functions in the
data_pipeline_audit
app, which are only used by management commands:get_es_counts_by_doc_type
print_domain_stats
andcompare_doc_counts
get_es_case_counts
compare_doc_ids
get_es_case_range
compare_doc_ids
get_es_case_ids
compare_doc_ids
Automated test coverage
Yes.
Migrations
Rollback instructions
If this PR is rolled back the migrations should be unapplied on all enviroments where it was deployed.