Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deep diff by editscript #90

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

guruma
Copy link

@guruma guruma commented Jan 26, 2023

I add a feature of deep diffing by using editscript library.
I think It would be very useful especially for finding diffs between two big maps.

(require '[eftest.runner :refer [e=]])
(deftest deep-map-test
  (is (e= {:a 1 :b [1 2 3] :c {:a 1}}
          {:a 2 :b [2 3 4] :c {:a 2}})))

e= stands for 'editscript equal'

> lein test
expected: (e= {:a 1, :b [1 2 3], :c {:a 1}} {:a 2, :b [2 3 4], :c {:a 2}})
  actual: (not (e= {:a 1, :b [1 2 3], :c {:a 1}} {:a 2, :b [2 3 4], :c {:a 2}}))
    diff: ({:path [:a], :exp 1, :act 2}
           {:path [:b 0], :exp 1, :act 2}
           {:path [:b 2], :exp 3, :act 4}
           {:path [:c :a], :exp 1, :act 2})

please review and let me know what you think!

@guruma
Copy link
Author

guruma commented Jan 26, 2023

Build failed because the original branch depends on clojure 1.7.0 in project.clj
Changing clojure 1.7.0 to clojure 1.9.0, on which editscript library depends for nat-int?, will make the build succeded

@weavejester
Copy link
Owner

Rather than changing how the 'pretty' reporter works, instead an 'editscript' reporter could be created instead.

Can you also show the difference between the editscript diff, and how its reported currently? Is editscript the best Clojure diffing library to use? Are there alternatives, and what do they look like?

@guruma
Copy link
Author

guruma commented Jan 28, 2023

Rather than changing how the 'pretty' reporter works, instead an 'editscript' reporter could be created instead.

I have no idea how much good an 'editscript' reporter is, because the 'pretty' reporter has lots of private functions which the report functions have to use.

Maybe It might be that I don't understand what you mean. Could you explain a little more about that?

Can you also show the difference between the editscript diff, and how its reported currently? Is editscript the best Clojure diffing library to use? Are there alternatives, and what do they look like?

Comparing Clojure Diff Libraries will help you understand the difference and you can check the benchmark of different diff libraries in the blog. There are two diffing algorithms that is used in editscript library. Editscript A* is slow, but Editscript quick is fast. I use Editscript quick for comparing expected and actaul, and Editscript A* for making diffs only when expected is not equal to actaul

Edit script is a way to quantify diffs between two strings, aka Levenshtein distance, which is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. But this is ultimately reduced to the problem of the shortest path between two vertices in a graph which can be solved optimally by A* algorithm.

If you want more informations, you can read ['Wu, S. et al., 1990, An O(NP) Sequence Comparison Algorithm, Information Processing Letters] on which editscript library is based.

@weavejester
Copy link
Owner

e= stands for 'editscript equal'

Why use e=? Why not diff automatically on failure?

I have no idea how much good an 'editscript' reporter is, because the 'pretty' reporter has lots of private functions which the report functions have to use.

What private functions would you need?

@guruma
Copy link
Author

guruma commented Jan 28, 2023

Why use e=? Why not diff automatically on failure?

I think that the original = need to be leaved for someone who need it.

What private functions would you need?

The problem is not what private function I need but the private functions themselves.
The private functions prevent others from adding new multimethod for their own dispatch.
For example, the following codes will show this.

(ns other-project.namespace
  (:require [clojure.test :refer :all]))

; this is possible
(defmethod clojure.test/assert-expr 'e= [msg form]
  (let [args (rest form)
        pred (first form)]
    `(let [values# (list ~@args)
           result# (apply ~pred values#)]
       (if result#
         (clojure.test/do-report {
                  :type :pass
                  :message ~msg
                  :expected '~form
                  :actual (cons ~pred values#)})
         (let [diffs# (apply make-diffs values#)]
           (clojure.test/do-report {
                  :type :fail-with-diffs
                  :message ~msg
                  :expected '~form
                  :actual (list '~'not (cons '~pred values#))
                  :diffs diffs#})))
       result#)))

When using only clojure.test, it is possible to add new multimethod. Because all function in clojure.test is not private as mentioned there as below.

;; Nothing is marked "private" here, so you can rebind things to plug
;; in your own testing or reporting frameworks.

(ns other-project.namespace
  (:require [clojure.test :as test]
            [eftest.runner :as ef :refer [e=]]))

; this is not possible
(defmethod report :fail-with-diffs [{:keys [message expected] :as m}]
  (test/with-test-out
    (test/inc-report-counter :fail)
    (print *divider*)
    (println (str (:fail *fonts*) "FAIL" (:reset *fonts*) " in") (testing-scope-str m))
    (when (seq test/*testing-contexts*) (println (test/testing-contexts-str)))
    (when message (println message))
    (when (= (first expected) 'e=)
      (editscript-equals-fail-report m))
    (print-output (capture/read-test-buffer))))

But it isn't possible to add new multimethod in eftest because the functions like testing-scope-str, print-output, are private.

@weavejester
Copy link
Owner

weavejester commented Jan 28, 2023

I think that the original = need to be leaved for someone who need it.

I think this design complects what you're testing (equality) with how it's being displayed (diffs). The current design of Eftest is that the reporters determine how the test output is presented to the user, not the tests themselves.

So if it showed a diff for failing data structures over a certain size, that would work better I think. Or if that behaviour could be forced via metadata on the test itself.

The problem is not what private function I need but the private functions themselves.
The private functions prevent others from adding new multimethod for their own dispatch.

Sure, but you can wrap report method itself, like the progress reporter does to the pretty reporter. As far as I can see, you add one extra reporting method, so what private functions do we absolutely need, and what can be delegated to the pretty reporter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants