-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow equality checks for certain sentences #1
Comments
Would this work as a third option?
The added benefit of this is that you could feed it a formula, and then it would be simplified and compressed to the DAG representation rather than the full branching tree. |
This isn't the case, unfortunately, and you don't even have to get very pathological to find a collision: I first thought to reuse existing internal node hashing which uses I think it could work, but not as a default.
Converting to a (V, E) DAG representation would be a safe solution if we can do it without losing information. That might also help with converting to NetworkX. |
Ah, well I was assuming you always hash the string of an object. And variables wrapping an integer should then be fine (collisions notwithstanding): i.e., Collisions still possible, but perhaps a binning scheme could be used to ensure uniqueness. At the very least, as you point out, we could use it as an optional optimization and then throw an error when we've detected a collision (reverse map of the hash to string to keep track). |
That would work well if you have simple labels that are all of the same type, but worse if you mix types, or if your label type has a very generic string representation. But as long as it's self-contained we don't need to bother with real hashes. We could just assign each variable a value arbitrarily, à la |
Ah, ya, I see your point. Given a generic Cake and eat it too? Pick and document a reasonable default, but allow for the alternative to be specified as part of a configuration. Already feels like some configuration would be helpful (do you want to try and do (collision risk) hashing? want to change the default SAT solver? etc). |
Btw, how does this entire discussion relate to the deduplication you've done? https://python-nnf.readthedocs.io/en/stable/_modules/nnf.html#NNF.deduplicate |
The kind of sentences where you get these slow equality checks (compact as DAGs but not as trees) is also the kind where you have to take care not to duplicate your nodes. But even if you don't duplicate nodes within the individual sentences you still get this problem with equality checks. The
|
Well if we have the hashing in place (configured or otherwise), then the hashing can be computed during either equality or deduplication -- since it's recursive, it can save over several operations. I guess I always assumed that equality makes sense to check on deduped theories, as otherwise you could just define a canonical string printout of the theory and match that way (issues w/ |
Sentences where many nodes have multiple parents can be slow to compare to each other.
This happens when two sentences are identical or almost identical, but exist as separate objects. Each equality check naively compares the children of nodes, but that means nodes can be checked a lot of times.
For example, using
examples/socialchoice.py
:There's no straightforward way to solve this, because the equality checks use
frozenset
equality. The problem is solved in other methods with a temporary cache around a function defined inside the method, but comparingfrozenset
s just invokes the__eq__
methods of the children again.Two potential solutions:
__eq__
is called, and clear the cache when the call to__eq__
that enabled the cache is finished. This probably interacts badly with threads, but only by potentially keeping objects in memory longer than necessary or discarding the cache too early, without outright incorrect behavior. It also adds at least a little overhead to typical__eq__
calls.frozenset
equality, and implement all of the equality checks from scratch. This could make performance much worse for typical cases.The problem only occurs when the objects are constructed separately. If an object is compared to itself Python is smart enough to notice and skip the lengthier check.
Solving it might not be worth it, unless a concrete case where it occurs pops up.
The text was updated successfully, but these errors were encountered: