Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to compare the contents of two packets. #147

Merged
merged 11 commits into from
Oct 24, 2024
Merged

Conversation

plietar
Copy link
Member

@plietar plietar commented Jun 25, 2024

The new orderly_compare_packets function takes in two packet IDs and produces a diff of the two packets' contents. This includes a comparison of both the packets' metadata and of their files. The actual diff of the files is computed using the diffobj package.

A parameter of the function can be used to control which components of the packets are considered in the comparison, and take be one of "everything", "metadata", "files" and "artefacts".

This feature was requested as part of the epireview project, where one report's implementation got very messy and needs refactoring, but in the absence of any way of comparing two runs of the same report it is difficult to determine whether the refactor has any unexpected effect on the output.

@plietar plietar requested review from weshinsley and richfitz June 25, 2024 13:19
@plietar plietar force-pushed the mrc-4425-diff branch 4 times, most recently from f6b6cbe to 9be0906 Compare June 25, 2024 13:27
@plietar plietar force-pushed the mrc-4425-diff branch 8 times, most recently from 7275d4a to 5fd1b5f Compare September 18, 2024 16:11
@plietar plietar removed the request for review from weshinsley September 18, 2024 16:14
@plietar plietar force-pushed the mrc-4425-diff branch 2 times, most recently from 22dd986 to 72ef3cb Compare September 18, 2024 16:32
Copy link
Member

@richfitz richfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this is looking great. Some broad comments about the comparison here, which I'm happy to litigate further in a conversation before you do any more implementation if you prefer!

ids <- c(...)
replacements <- sprintf("19700101-000000-%08x", seq_along(ids))
names(replacements) <- ids
function(x) stringr::str_replace_all(x, replacements)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use stringr for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub/gsub only perform a single substitution, where as we need to do many. I suppose I could compose calls and make one per replacement, but that's annoying to do in a general way. This was just easier.

R/compare.R Outdated Show resolved Hide resolved
R/compare.R Outdated Show resolved Hide resolved
R/compare.R Outdated Show resolved Hide resolved
R/compare.R Outdated Show resolved Hide resolved
@plietar
Copy link
Member Author

plietar commented Oct 17, 2024

diff <- orderly_compare_packets(p1, p2, root = root)
diff = list(root = root, metap1, metap2, ...)

class(diff) <- orderly_comparison

print(diff) --> summary view, no I/O
only list different field names, greyed out identical fields, ignore trivial fields (time, id, ...)

orderly_comparison_explain(diff, what # field name from metadata, verbose = FALSE)
orderly_comparison_explain()
# General purpose diff for unknown keys, we may have custom print outs for some of the keys, eg. files, dependencies, sessionInfo, ...
# verbose = TRUE might do file I/O using the saved root

S3 is.logical.orderly_comparison

The new `orderly_compare_packets` function takes in two packet IDs and
produces a diff of the two packets' contents. This includes a comparison
of both the packets' metadata and of their files. The actual diff of the
files is computed using the diffobj package.

A parameter of the function can be used to control which components of
the packets are considered in the comparison, and take be one of
"everything", "metadata", "files" and "artefacts".

This feature was requested as part of the epireview project, where one
report's implementation got very messy and needs refactoring, but in the
absence of any way of comparing two runs of the same report it is
difficult to determine whether the refactor has any unexpected effect on
the output.
- Replace the hack of returning TRUE with a class and instead return an
  R6 class with a `is_equal` method.
- Don't print file diff by default, use a `verbose` flag to `print`.
@plietar plietar force-pushed the mrc-4425-diff branch 2 times, most recently from 04935e3 to a3b6352 Compare October 24, 2024 03:51
Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 95.73171% with 7 lines in your changes missing coverage. Please review.

Project coverage is 99.26%. Comparing base (53cc972) to head (a3b6352).
Report is 42 commits behind head on main.

Files with missing lines Patch % Lines
R/compare.R 95.59% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #147      +/-   ##
==========================================
- Coverage   99.51%   99.26%   -0.25%     
==========================================
  Files          41       42       +1     
  Lines        3716     3957     +241     
==========================================
+ Hits         3698     3928     +230     
- Misses         18       29      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@plietar plietar requested a review from richfitz October 24, 2024 06:19
@plietar
Copy link
Member Author

plietar commented Oct 24, 2024

I think I’m mostly happy with the API now, except for the verbose argument that is a bit gross having 4 possible values.

The rendered output is a bit rough, especially when it comes to viewing the orderly specific metadata, but we can refine that over time.

Copy link
Member

@richfitz richfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a quick whirl and I think this will be fun to try out today. I'll let people know it's subject to change but the overall idea is now pretty solid I think. Agree that handling of verbose is a bit gross, and will have a think about what we can do with that

@richfitz richfitz merged commit 00dd499 into main Oct 24, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants