EQ oracles should not use hypothesis state iterators! #148

stateMachinist · 2025-02-02T14:17:26Z

stateMachinist
Feb 2, 2025

I just discovered a very bizarre phenomenon. I tested two different implementations of the same Mealy learning algorithm (can not publish the code yet). When I did some benchmarks to check if they behave the same, I noticed they differed in query and symbol performance. Naturally, I thought I had a bug that causes them to pose different queries. I then logged every of their interactions with the membership oracle and the counterexamples they received. It turned out that they posed exactly the same queries up until some point when they received different counterexamples. I thought: How could that be? If they pose the same queries, they must also construct the same hypotheses, since the algorithm is fully deterministic. So I exported the two hypotheses that caused the first divergence in counterexamples. Re-imported them in LearnLib and verified that they were equivalent. How could they possibly lead to different counterexamples? I am using RandomWp, which despite its name, given a fixed seed, behaves completely deterministic.

Turns out: My two learners constructed equivalent hypotheses, but created their respective states in different orders, because their internal data structures used different iterators. This then lead to different iteration orders in hypothesis.getStates(). I believe that this line of code in the RandomWp implementation then leads to different test words being generated.

This may sound like a minor issue, but I do believe it is significant. In my benchmarks, I have noticed that learning performance for some SULs is highly sensitive to the counterexamples passed to the learner (across many algorithms, not just mine). Thus, different implementations of the very same algorithm may have drastically different performance for the exact same configuration. And since this interference is not controlled by the oracle seed, it is not clear whether this will balance out over many runs, posing a threat to the reliablity of benchmark results.

I found an easy fix, exploiting a semantic gap between automata learning theory and LearnLib's implementation: From my understanding, many AAL algorithms are not really fully deterministic in theory. They are technically only deterministic if there is a well-defined order over the input alphabet, which is not an assumption the MAT framework makes. But without such an order, there is no deterministic way of iterating transitions and identifying target states, possibly changing the order in which new states are discovered. However, in practice, at least in LearnLib, the alphabet always has a deterministic iteration order. This way, learners are able to explore transitions in a deterministic order and learning performance is exactly reproducible.

Instead of passing the learner's hypothesis as is into the EQ oracle, I now always construct a canonical copy of it first. This copy is guaranteed to always store states in a deterministic order, independent of the order in which they are stored in the hypothesis instance received from the learner. I do this by visiting the states of the original hypothesis with a BFS starting from its initial state, using the iteration order of the alphabet.

I am not sure how relevant this phenomenon is when comparing different learning algorithms. But at least for some models (like this), I have observed for multiple different learners that average performance across 100 runs can vary by double-digit percentages when switching on and off this canonization step! I think that this is a noise factor that should be eliminated from testing algorithms.

mtf90 · 2025-02-03T17:36:41Z

mtf90
Feb 3, 2025
Maintainer

I tested two different implementations of the same Mealy learning algorithm

My two learners constructed equivalent hypotheses, but created their respective states in different orders

To me, this sounds like that they are not the same algorithm then. IMO, as you correctly pointed out, the core issue is the fact that "equivalence" means "modulo isomorphism". The two algorithms return different state permutations of the original model. They are equivalent (isomorphic) but not equal.

However, I think this is a necessary degree of freedom. Otherwise, you couldn't use different counterexamples analysis strategies (e.g., forward-oriented and backward-oriented) which may discover new hypothesis states at different times.

However, in practice, at least in LearnLib, the alphabet always has a deterministic iteration order.

Mainly in order to allow for mapping from/to indices. You have the same problem here since you are free to provide a Alphabets.fromArray(0, 1) or Alphabets.fromArray(1, 0). Both will yield you equivalent hypotheses, but they may take different routes to get there.

(To be fair, this point is also often neglected in benchmarks as well, i.e., you can easily choose the order in which your algorithm performs best compared to others.)

1 reply

stateMachinist Feb 24, 2025
Author

To me, this sounds like that they are not the same algorithm then.

I think that such a specific conception of the term algorithm is overconstrained. Wouldn't this imply that TTT, ADT etc. aren't actual algorithms, since they do not specify an iteration order? However, there are implications either way: If we call TTT an algorithm, then there are many different implementations of that algorithm which may lead to different performance and furthermore, there exists no (deterministic) canonical implementation. The most canonical I can think of would be one where set elements are picked uniformly random, but that would sacrifice determinism. Of course, one could use an RNG with a fixed seed to ensure reproducibility, but then the seed would become a configuration parameter of the algorithm.

I did learn a lot from our exchange about this topic. As you've convincingly argued, it doesn't make sense to prescribe an iteration order for the states of the hypothesis at abstract level. After all, the behavioral semantics of the model is fully determined by the structure of its transition graph (and its edge labels). However, during its construction and search for counterexamples, transitions must be explored in some arbitrary (but specific) order. Thus, it is possible to describe the model without an order, but it is not possible to construct it without one.

In other terms: In its theoretical conception, the learned model describes the final result, but it does not contain the history of its construction. And this is reasonable, because the model describes the same behavior anyway. Furthermore, there are many different ways to construct the same transition graph, but none is inherently better than the others.

As you've mentioned in our latest discussion, the MAT framework seems to focus on model semantics. I understand now that MAT operates at an abstraction layer where details are carved out far enough that mechanization of algorithms is possible in principle, but additional information is required for an actual implementation: i.e. how to iterate alphabet symbols and states.

Getting back to practical implications: The type of interference I described in the initial post of this thread can be understood as a consequence of the (accidental?) leakage of the hypothesis' construction history into the testing algorithm. I never realized that, at implementation level, more information is transferred from learner to teacher than the MAT framework models. And I believe this interference may be greater than I thought initially: Recently, my colleague pointed out that iteration orders might also affect the derivation of discriminators used for testing. However, I haven't performed any analysis to assess this yet.

However, I think this is a necessary degree of freedom. Otherwise, you couldn't use different counterexamples analysis strategies (e.g., forward-oriented and backward-oriented) which may discover new hypothesis states at different times.

I do not understand how this relates to the iteration order assigned to the states by the hypothesis object instance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EQ oracles should not use hypothesis state iterators! #148

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

EQ oracles should not use hypothesis state iterators! #148

stateMachinist Feb 2, 2025

Replies: 1 comment · 1 reply

mtf90 Feb 3, 2025 Maintainer

stateMachinist Feb 24, 2025 Author

stateMachinist
Feb 2, 2025

Replies: 1 comment 1 reply

mtf90
Feb 3, 2025
Maintainer

stateMachinist Feb 24, 2025
Author