Add a function to compare the contents of two packets. #147

plietar · 2024-06-25T13:19:50Z

The new orderly_compare_packets function takes in two packet IDs and produces a diff of the two packets' contents. This includes a comparison of both the packets' metadata and of their files. The actual diff of the files is computed using the diffobj package.

A parameter of the function can be used to control which components of the packets are considered in the comparison, and take be one of "everything", "metadata", "files" and "artefacts".

This feature was requested as part of the epireview project, where one report's implementation got very messy and needs refactoring, but in the absence of any way of comparing two runs of the same report it is difficult to determine whether the refactor has any unexpected effect on the output.

richfitz

Thanks - this is looking great. Some broad comments about the comparison here, which I'm happy to litigate further in a conversation before you do any more implementation if you prefer!

richfitz · 2024-07-15T11:04:25Z

tests/testthat/helper-outpack.R

+  ids <- c(...)
+  replacements <- sprintf("19700101-000000-%08x", seq_along(ids))
+  names(replacements) <- ids
+  function(x) stringr::str_replace_all(x, replacements)


Why do we need to use stringr for this?

sub/gsub only perform a single substitution, where as we need to do many. I suppose I could compose calls and make one per replacement, but that's annoying to do in a general way. This was just easier.

R/compare.R

plietar · 2024-10-17T14:16:25Z

diff <- orderly_compare_packets(p1, p2, root = root)
diff = list(root = root, metap1, metap2, ...)

class(diff) <- orderly_comparison

print(diff) --> summary view, no I/O
only list different field names, greyed out identical fields, ignore trivial fields (time, id, ...)

orderly_comparison_explain(diff, what # field name from metadata, verbose = FALSE)
orderly_comparison_explain()
# General purpose diff for unknown keys, we may have custom print outs for some of the keys, eg. files, dependencies, sessionInfo, ...
# verbose = TRUE might do file I/O using the saved root

S3 is.logical.orderly_comparison

The new `orderly_compare_packets` function takes in two packet IDs and produces a diff of the two packets' contents. This includes a comparison of both the packets' metadata and of their files. The actual diff of the files is computed using the diffobj package. A parameter of the function can be used to control which components of the packets are considered in the comparison, and take be one of "everything", "metadata", "files" and "artefacts". This feature was requested as part of the epireview project, where one report's implementation got very messy and needs refactoring, but in the absence of any way of comparing two runs of the same report it is difficult to determine whether the refactor has any unexpected effect on the output.

- Replace the hack of returning TRUE with a class and instead return an R6 class with a `is_equal` method. - Don't print file diff by default, use a `verbose` flag to `print`.

codecov · 2024-10-24T03:54:51Z

Codecov Report

Attention: Patch coverage is 95.73171% with 7 lines in your changes missing coverage. Please review.

Project coverage is 99.26%. Comparing base (53cc972) to head (a3b6352).
Report is 42 commits behind head on main.

Files with missing lines	Patch %	Lines
R/compare.R	95.59%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #147      +/-   ##
==========================================
- Coverage   99.51%   99.26%   -0.25%     
==========================================
  Files          41       42       +1     
  Lines        3716     3957     +241     
==========================================
+ Hits         3698     3928     +230     
- Misses         18       29      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

plietar · 2024-10-24T06:50:59Z

I think I’m mostly happy with the API now, except for the verbose argument that is a bit gross having 4 possible values.

The rendered output is a bit rough, especially when it comes to viewing the orderly specific metadata, but we can refine that over time.

richfitz

I've given this a quick whirl and I think this will be fun to try out today. I'll let people know it's subject to change but the overall idea is now pretty solid I think. Agree that handling of verbose is a bit gross, and will have a think about what we can do with that

plietar requested review from weshinsley and richfitz June 25, 2024 13:19

plietar force-pushed the mrc-4425-diff branch 4 times, most recently from f6b6cbe to 9be0906 Compare June 25, 2024 13:27

plietar mentioned this pull request Jun 25, 2024

Automated checks for data consistency mrc-ide/priority-pathogens#36

Open

plietar force-pushed the mrc-4425-diff branch 8 times, most recently from 7275d4a to 5fd1b5f Compare September 18, 2024 16:11

plietar removed the request for review from weshinsley September 18, 2024 16:14

plietar force-pushed the mrc-4425-diff branch 2 times, most recently from 22dd986 to 72ef3cb Compare September 18, 2024 16:32

richfitz requested changes Sep 19, 2024

View reviewed changes

plietar force-pushed the mrc-4425-diff branch from 72ef3cb to 4593902 Compare October 23, 2024 14:06

plietar added 9 commits October 24, 2024 04:47

Indent the file diffs

cee8b34

Improve testing, fix warnings..

0d388df

Update docs

e867acf

Update index

1987e0f

Be more robust to badly encoded files.

da5a7df

Fix signature

ffe1b82

Fix codefactor

f302f1a

New version.

0a28891

- Replace the hack of returning TRUE with a class and instead return an R6 class with a `is_equal` method. - Don't print file diff by default, use a `verbose` flag to `print`.

plietar force-pushed the mrc-4425-diff branch 2 times, most recently from 04935e3 to a3b6352 Compare October 24, 2024 03:51

New version, again.

7fb4a31

plietar force-pushed the mrc-4425-diff branch from a3b6352 to 7fb4a31 Compare October 24, 2024 05:58

fix docs

0db5cbd

plietar requested a review from richfitz October 24, 2024 06:19

richfitz approved these changes Oct 24, 2024

View reviewed changes

richfitz merged commit 00dd499 into main Oct 24, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a function to compare the contents of two packets. #147

Add a function to compare the contents of two packets. #147

plietar commented Jun 25, 2024

richfitz left a comment

richfitz Jul 15, 2024

plietar Oct 1, 2024

plietar commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 24, 2024

plietar commented Oct 24, 2024

richfitz left a comment

Add a function to compare the contents of two packets. #147

Add a function to compare the contents of two packets. #147

Conversation

plietar commented Jun 25, 2024

richfitz left a comment

Choose a reason for hiding this comment

richfitz Jul 15, 2024

Choose a reason for hiding this comment

plietar Oct 1, 2024

Choose a reason for hiding this comment

plietar commented Oct 17, 2024 • edited Loading

codecov bot commented Oct 24, 2024

Codecov Report

plietar commented Oct 24, 2024

richfitz left a comment

Choose a reason for hiding this comment

plietar commented Oct 17, 2024 •

edited

Loading